CN116069900A - Data source identification method, device, system, electronic device and storage medium - Google Patents

Data source identification method, device, system, electronic device and storage medium Download PDF

Info

Publication number
CN116069900A
CN116069900A CN202211480352.2A CN202211480352A CN116069900A CN 116069900 A CN116069900 A CN 116069900A CN 202211480352 A CN202211480352 A CN 202211480352A CN 116069900 A CN116069900 A CN 116069900A
Authority
CN
China
Prior art keywords
data source
log
data
identification result
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211480352.2A
Other languages
Chinese (zh)
Inventor
秦向阳
查超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202211480352.2A priority Critical patent/CN116069900A/en
Publication of CN116069900A publication Critical patent/CN116069900A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a data source identification method, a device, a system, an electronic device and a storage medium, wherein the data source identification method comprises the following steps: acquiring log data, and transmitting the log data to a message cluster to obtain log messages; acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result; and carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result. By the method and the device, the problems of low execution speed and low efficiency of data source division are solved, and efficient and accurate identification of the data source is realized.

Description

Data source identification method, device, system, electronic device and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data source identification method, device, system, electronic device, and storage medium.
Background
With the rapid development of technologies such as big data, cloud platforms, internet of things and the like, more and more data are produced and manufactured, and many of the data exist in a log form, so that a user needs to analyze a large amount of log data to realize the engineering value of the log data. However, since the data amount of the log data is too large, extraction and analysis of the log data become a heavy and complicated work.
In the related art, in order to better extract and analyze log data, the log data is generally classified according to the types of offline log data, so that the log data is classified and managed according to different types. However, under the technical background of big data and internet of things, log data is real-time streaming data, data sources are changed in a lot, data quantity is increased exponentially, the generation of a large amount of data can cause the ambiguity of data category division to be increased, the data offline data source division workload is large, the execution speed is slow, the efficiency is low, and the current engineering task requirements can not be met far.
Aiming at the problems of low execution speed and low efficiency of data source division in the related technology, no effective solution is proposed at present.
Disclosure of Invention
The embodiment provides a data source identification method, a device, a system, an electronic device and a storage medium, so as to solve the problems of low execution speed and low efficiency of data source division in the related technology.
In a first aspect, in this embodiment, there is provided a data source identification method, including:
acquiring log data, and transmitting the log data to a message cluster to obtain log messages;
acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result;
and carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.
In some embodiments, the performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result includes:
acquiring a sliding time window, acquiring a current window log message from the message cluster according to the sliding time window, and performing identification matching on the current window log message and the data source rule to obtain a current log identification result;
acquiring a next window log message according to the sliding time window, and identifying and matching the next window log message with the data source rule to obtain a next log identification result;
and obtaining the log recognition result according to the current log recognition result and the next log recognition result.
In some embodiments, the aggregating the log recognition result according to the same data source of the log recognition result to obtain and store the data source recognition result includes:
according to the sliding time window and the data source of the current log identification result, the same data source aggregation is carried out on the current log identification result, and the current data source identification result is obtained;
according to the sliding time window and the data source of the next log identification result, the same data source aggregation is carried out on the next log identification result, and the next data source identification result is obtained;
and obtaining the data source identification result according to the current data source identification result and the next data source identification result.
In some embodiments, the transmitting the log data to a message cluster to obtain a log message includes:
acquiring a log regular rule, and carrying out standardized processing on the log data according to the log regular rule to obtain standardized log data;
and transmitting the standardized log data to the message cluster to obtain the log message.
In some embodiments, the obtaining the data source rule at least performs data source identification matching on the log message according to the data source rule to obtain a log identification result, and includes:
acquiring a current data source rule, and storing the current data source rule into a data source rule catalog;
and monitoring the data source rule catalog, and under the condition of monitoring the next data source rule, carrying out data source identification matching on the log message at least according to the current data source rule and the next data source rule to obtain a log identification result.
In some of these embodiments, before the acquiring log data, further includes:
acquiring initialization configuration information of the message cluster, and starting a distributed coordination management service;
and synchronizing the initialization configuration information of the message cluster to the distributed coordination management service.
In a second aspect, in this embodiment, there is provided a data source identification apparatus, the apparatus including: the device comprises an acquisition module, an identification module and an aggregation module;
the acquisition module is used for acquiring log data and transmitting the log data to a message cluster to obtain log messages;
the identification module is used for acquiring a data source rule, and carrying out data source identification matching on the log message at least according to the data source rule to obtain a log identification result;
and the aggregation module is used for carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.
In a third aspect, in this embodiment, there is provided a data source identification system, the system comprising: terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;
the server device is configured to implement the data source identification method described in the first aspect;
the transmission device is used for sending the log data and the data source rule to the server device;
the terminal equipment is used for acquiring log data and data source rules.
In a fourth aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the data source identification method described in the first aspect.
In a fourth aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the data source identification method of the first aspect described above.
Compared with the related art, the data source identification method, the device, the system, the electronic device and the storage medium provided in the embodiment transmit the log data to the message cluster to obtain the log message by acquiring the log data; acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result; the same data source aggregation is carried out on the log identification result according to the data source of the log identification result, the data source identification result is obtained and stored, the problems of low execution speed and low efficiency of data source division are solved, and the rapid and accurate identification of the log data source is realized.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is an application scenario diagram of a data source identification method in one embodiment;
FIG. 2 is a flow chart of a method of data source identification in one embodiment;
FIG. 3 is a flow chart of a method of identifying data sources in another embodiment;
FIG. 4 is a flow chart of log collection and normalization steps in one embodiment;
FIG. 5 is a flow chart of a data source synchronization step in one embodiment;
FIG. 6 is a block diagram of a data source identification device in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.
Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.
The data source identification method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal device 102 communicates with the server device 104 via a network. The server device 104 obtains log data and transmits the log data to the message cluster to obtain log information; the server device 104 obtains the data source rule, and performs data source identification matching on the log message at least according to the data source rule to obtain a log identification result; the server device 104 performs the same data source aggregation on the log identification result according to the data source of the log identification result, and obtains and stores the data source identification result. The terminal device 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server device 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
In this embodiment, a data source identification method is provided, where the data source identification method may be deployed on a large data stream computing engine of the server device 104, where the large data stream computing engine may be Apex, ballista, flink, spark Streaming or Storm, and preferably is deployed on a link; fig. 2 is a flowchart of the data source identification method of the present embodiment, and as shown in fig. 2, the flowchart includes the following steps:
step S202, obtaining log data, and transmitting the log data to a message cluster to obtain log information. The log data may be a terminal log, or may be a task or event log; the message cluster refers to a publish/subscribe based message system for big data, which may be kafka, pulsar or Pravega, for example; before the log data is transmitted to the message cluster, the message cluster needs to be started, and the message cluster is initialized, so that the consumption speed of kafka is improved by configuring a copy (Replica) and a Partition (Partition) of the message cluster according to the flow of the log data and the attribute information of the server device 104.
Specifically, taking kafka as an example of a message cluster identified by the data source, the server device 104 acquires log data, and starts the kafka service, opens the kafka message cluster, and configures a copy and partition of the kafka; in the case where the configuration of the kafka is completed, the log data is transmitted to the kafka message cluster, resulting in a log message.
Step S204, obtaining a data source rule, and carrying out data source identification matching on the log message at least according to the data source rule to obtain a log identification result. The data source rule is a preset data source rule or a data source rule input by a user; the data source rules may be related field association rules, or other rules that identify the data source; the data source identification matching means that the source of the log message is identified and matched according to the identification logic of the data source rule, and a data source field is attached; the data source field may be Linux, windows, a terminal, or other expandable device or system; the identified matches for the log message may be matched by row.
Specifically, taking a large data stream computing engine Flink as an example, a complex event processing (Complex Event Processing, CEP for short) module on the Flink acquires the data source rule and acquires a log message from the kafka message cluster; the CEP performs line identification matching on the log message at least according to the data source rule through an internal mode application program interface (Pattern Application Programming Interface, pattern API for short), and attaches a data source field, and after all the line identifications match, the log identification result is obtained.
Step S206, the same data source aggregation is carried out on the log identification result according to the data source of the log identification result, and the data source identification result is obtained and stored. The data source identification result may be stored in a columnar storage database, for example, HBase, clickHouse, druid or HP vertical, etc. Before the same data source aggregation is performed on the log identification result, the log identification result may be partitioned according to the sliding time window, and then the same data source aggregation is further performed in different sliding time windows, so that the sliding time window is used to filter data, and the accuracy of data source identification is improved.
Specifically, fig. 3 is a flowchart of another data source identification method according to the present embodiment, and as shown in fig. 3, the flowchart includes the following steps:
step S302, kafka is initialized. The server device 104 starts the message cluster, initializes the message cluster, configures a copy (Replica) and Partition (Partition) of the message cluster according to the flow of the log data and attribute information of the server device 104, and increases the consumption speed of kafka.
In step S304, the message cluster acquires log data. The server device 104 obtains log data and transmits the log data to the message cluster to obtain a log message.
Step S306, a data source rule is acquired. The server device 104 obtains the data source rule.
In step S308, the data source identifies a match. The server device 104 performs data source identification matching on the log message at least according to the data source rule, so as to obtain a log identification result.
Step S310, the same data sources are aggregated. The server device 104 performs the same data source aggregation on the log identification result according to the data source of the log identification result, and obtains the data source identification result.
Step S312, store to ClickHouse. Server device 104 stores the data source identification result in the ClickHouse.
Through the steps, the message clusters are utilized to carry out streaming processing on the real-time log data, so that the operation efficiency of a large amount of log data can be improved; according to the data source identification method, log data can be rapidly and efficiently partitioned, so that a user is helped to efficiently execute user behavior analysis, equipment abnormality analysis and early warning, single or multi-time sequence abnormality analysis and the like according to different log data sources, the data is driven to create greater value, and the problems of low data source partition execution speed and low efficiency are solved.
In some embodiments, performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result, including:
acquiring a sliding time window, acquiring a current window log message from the message cluster according to the sliding time window, and carrying out data source identification matching on the current window log message and the data source rule to obtain a current log identification result;
acquiring a next window log message according to the sliding time window, and carrying out data source identification matching on the next window log message and the data source rule to obtain a next log identification result;
and obtaining the log recognition result according to the current log recognition result and the next log recognition result.
Wherein the sliding time window consists of a fixed window length and sliding interval, which may overlap, the window (window) may cut the unlimited stream into a limited-size bucket (bucket) for further processing, distributing the stream data. In this embodiment, the bucket refers to a set in the memory.
Specifically, the CEP module of the link on the server device 104 acquires a preset sliding time window, for example, the window length is 60s, and the interval is 10s; CEP obtains current window log information in current 60s from the information cluster according to the sliding time window, carries out data source identification matching on the current window log information according to the line and the data source rule, and obtains a current log identification result according to the matching result of each line; after 10s of interval, CEP obtains the next 60s of next window log information according to the sliding time window, and carries out data source identification matching on the next window log information and the data source rule to obtain the next log identification result; the server device 104 obtains the log recognition result according to the current log recognition result and the next log recognition result.
Through the steps, the data quantity of the acquired log information and the data quantity stored in the database after control identification can be limited by acquiring the sliding time window, so that the efficiency of log data analysis and batch insertion into the database is improved, and the current window is combined with part of historical data in the previous window for data source identification for multiple times by acquiring the overlapped sliding time window, so that the identification precision is improved, and the problems of low data source division execution speed, low efficiency and low precision are solved.
In some embodiments, the same data source aggregation is performed on the log identification result according to the data source of the log identification result, so as to obtain and store the data source identification result, including:
according to the sliding time window and the data source of the current log identification result, carrying out the same data source aggregation on the current log identification result to obtain the current data source identification result;
according to the sliding time window and the data source of the next log identification result, the same data source aggregation is carried out on the next log identification result, and the next data source identification result is obtained;
and obtaining the data source identification result according to the current data source identification result and the next data source identification result.
Through the steps, the log data of the same data source in a period of time can be further clarified by carrying out data source aggregation on the log identification result according to the sliding time window, and the data streams of different log data sources can be identified and distinguished secondarily, so that the accuracy of data source identification is improved, the data dimension is reduced, the processing efficiency is improved, and the problems of low data source division execution speed, low efficiency and low precision are solved.
In some of these embodiments, transmitting the log data into a message cluster to obtain a log message includes:
acquiring a log regular rule, and carrying out standardized processing on the log data according to the log regular rule to obtain standardized log data;
transmitting the standardized log data into the message cluster to obtain the log message.
The standardized log data refers to format data which can be cross-platform identified and loaded, for example, json format data, and the log regular rule is a regular expression which corresponds the log data with the json format data.
Specifically, fig. 4 is a schematic diagram of a log collection and normalization step in the present embodiment, as shown in fig. 4, step S402 is performed to analyze log data; specifically, the server device 104 obtains log data from the terminal and/or the device, and parses the log data to obtain machine language data that can be directly read by the server device 104; step S404, acquiring a log regular rule; specifically, the server device 104 acquires log regularization rules; step S406, log standardization; specifically, the server device 104 performs json standardization processing on the machine language data according to the log regularization rule to obtain standardized log data; step S408, transmitting to a message cluster; specifically, the server device 104 transmits the standardized log data into the message cluster kafka to obtain the log message.
Through the steps, the log data are converted into machine language data which can be directly read, the machine language data are converted into standardized data which can be cross-platform identified and loaded, the message clusters are utilized to conduct streaming processing on the real-time standardized log data, the operation efficiency of a large amount of log data can be improved, and the problems of low data source division execution speed and low efficiency are solved.
In some embodiments, obtaining a data source rule, performing data source identification matching on the log message at least according to the data source rule, to obtain a log identification result, including:
acquiring a current data source rule and storing the current data source rule into a data source rule catalog;
and monitoring the data source rule catalog, and under the condition of monitoring the next data source rule, carrying out data source identification matching on the log message at least according to the current data source rule and the next data source rule to obtain a log identification result.
The data source rule catalog can be established in the distributed coordination management service, and the current data source rule and the next data source rule are stored in the data source rule catalog of the distributed coordination management service; the distributed coordination management service is deployed on the server device 104, and may be a Zookeeper; the monitoring of the data source rule list may be performed by a CEP module in the link on the server device 104, so that, when the CEP monitors a newly added next data source rule, the next data source rule is obtained, and the log message is subjected to data source matching according to the next data source rule.
Specifically, fig. 5 is a schematic diagram of a data source synchronization step in this embodiment, as shown in fig. 5, step S502 is performed to obtain a current data source rule; specifically, the server device 104 obtains the current data source rule; step S504, creating a current child node under a data source rule list of the Zookeeper; specifically, according to the current data source rule, a current child node is created under a data source rule list of a Zookeeper in the server device 104, and the child node stores the current data source rule; step S506, obtaining a next data source rule; specifically, the Zookeeper acquires the next data source rule; step S508, creating a next child node under the data source rule list of the Zookeeper; specifically, creating a next child node under the data source rule catalog, wherein the next child node stores a next data source rule; step S510, monitoring a data source rule catalog of a Zookeeper by a CEP module; specifically, the CEP module in the link on the server device 104 listens to the data source rule list of the Zookeeper; step S512, synchronizing the next data source rule to the memory of the CEP if the next data source rule or the next child node is monitored; specifically, the server device 104 synchronizes the next data source rule into the memory of the CEP; in step S514 of the process, obtaining a log identification result; specifically, the server device 104 performs data source identification matching on the log message according to the current data source rule, the next data source rule and the sliding time window, so as to obtain a log identification result.
Through the steps, the synchronization of the updated data source rule can be realized through the storage management of the data source rule by the distributed coordination management service and the monitoring mechanism of updating the data source rule by the Flink, the real-time synchronous tracking of the new data source rule is realized, the accuracy of data source matching identification is improved, and the problem of low data source division accuracy is solved.
In some of these embodiments, prior to the obtaining the log data, further comprising:
acquiring the message cluster initialization configuration information and starting a distributed coordination management service;
and synchronizing the initialization configuration information of the message cluster to the distributed coordination management service.
Through the steps, the initialization configuration information of the message cluster is synchronized to the distributed coordination management service, the distributed coordination management service can be used as a registration center to manage the message cluster, and meanwhile, the distributed coordination management service can be used as a controller for synchronizing the data source rules because the distributed coordination management service stores the data source rules, so that unified coordination management is carried out on the message cluster and the stream processing engine Flink, the data source identification is more orderly, and the data source identification order is improved.
It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.
The embodiment also provides a data source identification device, which is used for implementing the above embodiment and the preferred implementation manner, and is not described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
Fig. 6 is a block diagram showing the structure of a data source identifying apparatus of the present embodiment, as shown in fig. 6, comprising: an acquisition module 10, an identification module 20 and an aggregation module 30;
the acquiring module 10 is configured to acquire log data, and transmit the log data to a message cluster to obtain a log message;
the identification module 20 is configured to obtain a data source rule, and perform data source identification matching on the log message at least according to the data source rule, so as to obtain a log identification result;
the aggregation module 30 is configured to aggregate the same data sources of the log identification result according to the data sources of the log identification result, obtain the data source identification result, and store the data source identification result.
For specific limitations of the data source identification means, reference may be made to the above limitations of the data source identification method, and no further description is given here. The respective modules in the above-described data source identification apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
There is also provided in this embodiment a data source identification system including: a terminal device 102, a transmission device, and a server device 104; wherein the terminal device 102 is connected to the server device 104 through a transmission device;
the server device 104 is configured to perform the steps of any of the method embodiments described above;
the transmission device is used for sending the log data and the data source rule to the server device;
the terminal device 102 is configured to obtain log data and data source rules.
There is also provided in this embodiment an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring log data, and transmitting the log data to a message cluster to obtain log messages.
S2, acquiring a data source rule, and carrying out data source identification and matching on the log message at least according to the data source rule to obtain a log identification result.
And S3, carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the data source identification method provided in the above embodiment, a storage medium may be provided in this embodiment. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the data source identification methods of the above embodiments.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data source identification result data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data source identification method.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be understood that the specific embodiments described herein are merely illustrative of this application, and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.
It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.
The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of data source identification, comprising:
acquiring log data, and transmitting the log data to a message cluster to obtain log messages;
acquiring a data source rule, and performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result;
and carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.
2. The method for identifying a data source according to claim 1, wherein the performing data source identification matching on the log message at least according to the data source rule to obtain a log identification result includes:
acquiring a sliding time window, acquiring a current window log message from the message cluster according to the sliding time window, and performing identification matching on the current window log message and the data source rule to obtain a current log identification result;
acquiring a next window log message according to the sliding time window, and identifying and matching the next window log message with the data source rule to obtain a next log identification result;
and obtaining the log recognition result according to the current log recognition result and the next log recognition result.
3. The method for identifying data sources according to claim 2, wherein the step of performing the same data source aggregation on the log identification result according to the data source of the log identification result to obtain and store the data source identification result includes:
according to the sliding time window and the data source of the current log identification result, the same data source aggregation is carried out on the current log identification result, and the current data source identification result is obtained;
according to the sliding time window and the data source of the next log identification result, the same data source aggregation is carried out on the next log identification result, and the next data source identification result is obtained;
and obtaining the data source identification result according to the current data source identification result and the next data source identification result.
4. The method for identifying a data source according to claim 1, wherein said transmitting the log data to a message cluster to obtain a log message comprises:
acquiring a log regular rule, and carrying out standardized processing on the log data according to the log regular rule to obtain standardized log data;
and transmitting the standardized log data to the message cluster to obtain the log message.
5. The method for identifying a data source according to claim 1, wherein the obtaining the data source rule, performing data source identification matching on the log message at least according to the data source rule, and obtaining a log identification result, includes:
acquiring a current data source rule, and storing the current data source rule into a data source rule catalog;
and monitoring the data source rule catalog, and under the condition of monitoring the next data source rule, carrying out data source identification matching on the log message at least according to the current data source rule and the next data source rule to obtain a log identification result.
6. The data source identification method according to any one of claims 1 to 5, characterized by further comprising, before the acquisition of log data:
acquiring initialization configuration information of the message cluster, and starting a distributed coordination management service;
and synchronizing the initialization configuration information of the message cluster to the distributed coordination management service.
7. A data source identification device, comprising: the device comprises an acquisition module, an identification module and an aggregation module;
the acquisition module is used for acquiring log data and transmitting the log data to a message cluster to obtain log messages;
the identification module is used for acquiring a data source rule, and carrying out data source identification matching on the log message at least according to the data source rule to obtain a log identification result;
and the aggregation module is used for carrying out the same data source aggregation on the log identification result according to the data source of the log identification result, obtaining and storing the data source identification result.
8. A data source identification system, comprising: terminal equipment, transmission equipment and server equipment; the terminal equipment is connected with the server equipment through the transmission equipment;
the server device being configured to perform the data source identification method of any one of claims 1 to 6;
the transmission device is used for sending the log data and the data source rule to the server device;
the terminal equipment is used for acquiring log data and data source rules.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the data source identification method of any of claims 1 to 6.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the data source identification method of any of claims 1 to 6.
CN202211480352.2A 2022-11-24 2022-11-24 Data source identification method, device, system, electronic device and storage medium Pending CN116069900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211480352.2A CN116069900A (en) 2022-11-24 2022-11-24 Data source identification method, device, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211480352.2A CN116069900A (en) 2022-11-24 2022-11-24 Data source identification method, device, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN116069900A true CN116069900A (en) 2023-05-05

Family

ID=86168996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211480352.2A Pending CN116069900A (en) 2022-11-24 2022-11-24 Data source identification method, device, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN116069900A (en)

Similar Documents

Publication Publication Date Title
US20240146771A1 (en) Inclusion of time-series geospatial markers in analyses employing a cyber-decision platform
CN108566290B (en) Service configuration management method, system, storage medium and server
CN110633277B (en) Time sequence data storage method, device, computer equipment and storage medium
EP3182284A1 (en) Data pipeline architecture for cloud processing of structured and unstructured data
CN107800565B (en) Inspection method, inspection device, inspection system, computer equipment and storage medium
US20210133256A1 (en) Method and apparatus for crowdsourced data gathering, extraction, and compensation
US10754869B2 (en) Managing data format of data received from devices in an internet of things network
CN109298924B (en) Timing task management method, computer readable storage medium and terminal device
CN110825820A (en) Real-time data label obtaining method and device, computer equipment and storage medium
CN108933994B (en) Short message distribution processing method and device, computer equipment and storage medium
CN111209310A (en) Service data processing method and device based on stream computing and computer equipment
CN104202387A (en) Metadata recovery method and related device
CN108875035B (en) Data storage method of distributed file system and related equipment
EP2980701B1 (en) Stream processing with context data affinity
CN110275703B (en) Method and device for assigning key value to data, computer equipment and storage medium
CN111580948A (en) Task scheduling method and device and computer equipment
CN109218131B (en) Network monitoring method and device, computer equipment and storage medium
CN113537495A (en) Model training system, method and device based on federal learning and computer equipment
CN110442439B (en) Task process processing method and device and computer equipment
CN116069900A (en) Data source identification method, device, system, electronic device and storage medium
US9075670B1 (en) Stream processing with context data affinity
CN109408532B (en) Data acquisition method, device, computer equipment and storage medium
CN112019689A (en) Incoming call show service processing system and method
CN111147226A (en) Data storage method, device and storage medium
CN111464596B (en) Data processing system, method, apparatus, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination