CN111796993A - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents

Data processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111796993A
CN111796993A CN201910990260.0A CN201910990260A CN111796993A CN 111796993 A CN111796993 A CN 111796993A CN 201910990260 A CN201910990260 A CN 201910990260A CN 111796993 A CN111796993 A CN 111796993A
Authority
CN
China
Prior art keywords
log data
service
service type
data
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910990260.0A
Other languages
Chinese (zh)
Other versions
CN111796993B (en
Inventor
陈必成
林顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yaji Software Co Ltd
Original Assignee
Xiamen Yaji Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yaji Software Co Ltd filed Critical Xiamen Yaji Software Co Ltd
Priority to CN201910990260.0A priority Critical patent/CN111796993B/en
Publication of CN111796993A publication Critical patent/CN111796993A/en
Application granted granted Critical
Publication of CN111796993B publication Critical patent/CN111796993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a data processing method and device, electronic equipment and a computer readable storage medium, and relates to the field of big data processing. The method comprises the following steps: when a service index generation request is received, determining the service type of a service index to be generated, then acquiring log data of the service type, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types, and then generating a corresponding service index based on the log data of the service type. According to the embodiment of the application, the cost and the complexity of data processing are reduced, the time for obtaining the index data can be reduced, and the user experience is further improved.

Description

Data processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, the field of big data processing is also developed, and in order to record information such as daily operation behaviors of a user, a system generates and stores an operation log of the user. The follow-up system can also analyze the operation log of the user to determine various index data.
Currently, the method of analyzing the operation log of the user to determine various index data generally includes: and when each type of index data is determined, traversing all the stored logs, determining the logs related to the type of index and analyzing to determine each type of index data. However, because the data volume of the stored operation logs of the user is huge, all the logs need to be traversed and analyzed when determining each type of index data, the data processing cost is high, the complexity is high, the time for obtaining the index data is long, and the user experience is poor.
Disclosure of Invention
The application provides a data processing method, a data processing device, an electronic device and a computer readable storage medium, which can solve at least one technical problem. The technical scheme is as follows:
in a first aspect, a data processing method is provided, and the method includes:
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring log data of service types, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types;
and generating a corresponding service index based on the log data of the service type.
In a possible implementation manner, determining a service type of a service indicator to be generated further includes:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on the detected request in the preset format;
presetting the acquired log data to obtain the log data of each service type;
and respectively loading the log data of each service type into the corresponding logic table.
In another possible implementation manner, the pre-setting processing is performed on the acquired log data to obtain log data corresponding to each service type, and the method includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, decoding the intercepted log data, before further comprising:
carrying out decryption processing on the log data containing the encrypted identification in each intercepted segment of log data;
decoding each section of intercepted log data, wherein the decoding processing comprises the following steps:
and decoding each section of log data after decryption.
In another possible implementation manner, decoding any one piece of intercepted log data, before further comprising:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
In another possible implementation manner, the acquired log data is subjected to preset processing to obtain log data of each service type, and then any one of the following items is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
In another possible implementation manner, the log data of each service type is loaded into a readable file respectively, and the method includes any one of the following steps:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables;
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, log data of the service type is acquired; generating a corresponding service index based on the log data of the service type, wherein the service index comprises any one of the following items:
acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
determining a sub-service type in the service type of the service index to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, the obtaining of the log data from the log collection server includes:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the obtaining, by the log collection system flash, the log data that has changed further includes:
storing the changed log data to a distributed file system through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
In a second aspect, there is provided a data processing apparatus comprising:
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, and the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types;
and the generating module is used for generating a corresponding service index based on the log data of the service type.
In one possible implementation, the apparatus further includes: a second obtaining module, a processing module, and a loading module, wherein,
the second acquisition module is used for acquiring log data from the log collection server, and the log data is generated by the log collection server based on the detected request in the preset format;
the processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner, when the processing module performs preset processing on the acquired log data to obtain log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification;
when the processing module decodes the intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner, the apparatus further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type to the distributed file system according to the service type, or storing the log data of each service type to the distributed file system in a partition mode according to the sub-service type, and storing the log data of different sub-service types in different partitions.
In another possible implementation manner, when the loading module loads the log data of each service type into the logic table, the loading module is specifically configured to:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or,
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: acquiring a logic table corresponding to the service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the apparatus further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the corresponding operation of the data processing method according to the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the data processing method according to the first aspect or any possible implementation manner of the first aspect.
The beneficial effect that technical scheme that this application provided brought is:
compared with the prior art that all log data need to be traversed when each type of index data is determined, the method and the device determine the service type of the service index to be generated when a service index generation request is received, then obtain the log data of the service type, classify the log data collected by the log collection server according to different service types to obtain each type of log data, and then generate the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device for data processing according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
flume: the system is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting mass logs, and the Flume supports various data senders customized in the log system and used for collecting data; at the same time, flash provides the ability to simply process data and write to various data recipients (customizable).
Kafka: is an open source stream processing platform and is written by Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. These data are typically addressed by handling logs and log aggregations due to throughput requirements. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a data processing method, which may be executed by an electronic device, where the electronic device may include: as shown in fig. 1, the method includes:
step S101, when a service index generation request is received, determining the service type of a service index to be generated.
For the embodiment of the present application, the service type that can determine the service index to be generated based on the service index generation request may include one service type, and may also include at least two service types. The embodiments of the present application are not limited.
For example, the service type of the service indicator to be generated may include: user class indicators, consumption class indicators, and event class indicators.
And step S102, acquiring the log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
For the embodiment of the application, the log data collected by the log collection server can be classified according to different service types in advance. The specific implementation manner for classifying the log data collected by the log collection server according to different service types in the embodiment of the present application is described in detail in the following embodiments, and is not described herein again.
For example, log data collected by the log collection server is classified in advance according to a user class, a consumption class, and an event class, and log data related to a user operation, log data related to consumption, and log data related to an event are obtained.
The log data collected by the log collection server in advance in the foregoing embodiment is classified according to event classes, and specifically, the log data collected by the log collection server may be classified according to a user-defined event in advance, or may be classified according to a preset event, which is not limited in this embodiment. For example, the custom event or the preset event may include: any event such as a boot event, an application exit event, etc.
For the embodiment of the application, after the log data received by the log collection server is classified according to different service types, the log data related to the service type can be directly obtained to obtain the corresponding service index.
And step S103, generating a corresponding service index based on the log data of the service type.
A specific example based on step S101, step S102, and step S103 is: when a service index generation request is received, determining that the service type of a service index to be generated is a user type, then acquiring log data related to the user, and then generating a user type index based on the acquired log data related to the user.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the data processing method determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, and classifies the log data collected by the log collection server according to different service types to obtain the log data of each type, and then generates the corresponding service index based on the log data of the service type. In other words, in the embodiment of the present application, collected log data are classified in advance according to different service types, and when a certain service index is generated, only the log data of the service index type needs to be acquired from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for acquiring the index data can be reduced, and the user experience can be further improved.
In a possible implementation manner of the embodiment of the present application, before the step S101, the method may further include: step Sa (not shown), step Sb (not shown), and step Sc (not shown), wherein,
and step Sa, acquiring log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
For the embodiment of the present application, before acquiring log data from the log collection server, the method may further include: the log collecting server (Nginx) generates a log based on a request in a preset format when detecting the request in the preset format, and stores the generated log to a log server (log server).
For example, when the format https:// logstorage. cos.com/log/v 1? When requested by "+ [ url coded json string ], a log is generated based on the request data, and the generated log is stored in a log server, and more specifically, the generated log may be stored in a file of/var/log/nginx/metrics-access.
From the above, it can be seen that: the log collection server may generate log data, and thus, when a change in the log data in the log collection server is detected, for example, when new log data is added, the log data is acquired from the log collection server.
Specifically, the acquiring of the log data from the log collection server includes: when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue; changed log data is pulled from kafka by Spark-streaming.
The pulling of the changed log data from the kafka through Spark-streaming may specifically include: changed log data is pulled from kafka through Spark-streaming calls to kafka Application Programming Interface (API).
For the embodiment of the present application, obtaining the changed log data through the flash and the kafka, and then: storing the changed log data to a distributed file system through the flash and the kafka; and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
Specifically, in order to avoid the problem that log data is not found in the subsequent process of processing the log data, a dump policy may be configured in the flash and kafka in advance to implement backup of the original log data, that is, the acquired log data is dumped to the distributed file system at specific time intervals. Further, the acquired log data can be transferred to a specific directory under the distributed file system, and the data under the directory is used for data backtracking and problem location later.
For example, the acquired log data is transferred to the distributed file system every 60 seconds.
For the embodiment of the application, the obtained log data is transferred to the distributed file system at certain intervals, so that the log data stored in the distributed file system is cleaned at preset intervals in order to avoid occupying a large storage space of the distributed file system.
For example, log data stored in the last 60 days is cleaned up every 60 days.
Of course, obtaining the log data from the log collection server may further include: when the log data in the log collection server is monitored to be changed, the changed log data is obtained through dis; the changed log data is pulled from the dis through a Spark-streaming calling dis Application Programming Interface (API).
Further, when log data that has changed is obtained by dis, the log data may also be transferred to a loc/rowlog file in a specific directory.
The processing method of dis for log data is similar to that of the jump and kafka for logs, and as described above, the description is omitted here.
The embodiment of the present application is not limited to the above-mentioned processing of the log by the flash and kafka, and the processing of the log by the dis
And Sb, carrying out preset processing on the acquired log data to obtain the log data of each service type.
In the above embodiment, after the log data is acquired from the log collection server, the log data needs to be analyzed and decoded to obtain the log data of each service type. Specifically, the acquired log data is analyzed and decoded through Spark-streaming, so that log data of each service type is obtained.
The specific way of performing the preset processing on the acquired log data is described in detail in the following embodiments, and is not described herein again.
And step Sc, loading the log data of each service type into a corresponding logic table respectively.
For the embodiment of the present application, after the log data of each type is obtained, the log data of each type may be loaded into the corresponding logic table, so that the subsequent analysis processing device may obtain the latest log data. The logic table in the embodiment of the present application may include: hive table.
For example, each type of log data may be loaded into a corresponding logical table through a job script DLI (or Hadoop-live) configured in a scheduling service DLF (or a scheduling job process). In the embodiment of the application, a Data Lake Factory (Data Lake Factory) provides a one-stop big Data collaborative development platform, a user can easily complete multiple tasks such as Data modeling, Data integration, script development, job scheduling and operation and maintenance monitoring, the threshold of using big Data by the user is greatly reduced, and the user is helped to quickly construct a big Data processing center. Further, the embodiment of the present application is not limited to invoking DLF to load log data into a corresponding logic table, and any manner that can load log data into a logic table is within the protection scope of the embodiment of the present application.
Wherein, Hive in Hadoop-Hive is a data warehouse infrastructure established on Hadoop. It provides a set of tools that can be used to perform data Extraction Transformation Loading (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop.
In another possible implementation manner of the embodiment of the present application, step Sb specifically may include: step Sb1 (not shown), step Sb2 (not shown), step Sb3 (not shown), and step Sb4 (not shown), wherein,
and step Sb1, intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data.
Specifically, the acquired log data can be intercepted by taking \ t as an identifier.
For example, the log data obtained is XXXX \ t ×, and then two segments can be cut by using \ t as the identifier.
And step Sb2, decoding each piece of log data in the plurality of pieces of captured log data.
For the embodiment of the present application, after the obtained log data is intercepted, the intercepted log data may be subjected to urldecode processing to obtain decoded data. The urldecode is a Uniform Resource Location (URL) encoding scheme.
Further, since there may be encrypted data in each intercepted piece of log data, before step Sb2, the method may further include: and carrying out decryption processing on the log data containing the encrypted identification in each section of the intercepted log data.
Specifically, determining whether each piece of intercepted log data includes an encrypted identifier, for example, V1 may represent that the piece of log data is encrypted log data, V2 may represent that the piece of log data is unencrypted log data, and after determining the encrypted log data, decrypting the encrypted log data, so that, if some pieces of intercepted log data are decrypted, step Sb2 may specifically include: and decoding each section of log data after decryption.
The specific decoding method is detailed above and is not described again.
In the above embodiment, the step Sb is executed after the encryption/decryption judgment and the decryption processing are performed on each piece of intercepted data, but it is also possible to perform the encryption/decryption judgment once, and if the piece of intercepted data is encrypted data, the decryption processing is performed on the piece of intercepted data, then the decoding processing is directly performed on the piece of intercepted log data, and then the encryption/decryption judgment is performed on any piece of intercepted log data, that is, the decoding processing is performed on any piece of intercepted log data, and before the step Sb, the step Sb further includes: and if any section of log data contains the encryption identifier, decrypting any section of log data.
Step Sb3 performs format conversion processing on each piece of log data after the decoding processing.
For the embodiment of the application, after each piece of data is decoded, each piece of decoded data can be converted into a jason format. Wherein, jason is short for JavaScript Object Notation, and is a lightweight data representation method. The jason format records data in a key-value mode.
Specifically, each piece of decoded log data is converted into a jason format by fastJason.
And step Sb4, performing data mapping on each section of log data after format conversion according to the service type to obtain log data of each service type.
Another possible implementation manner of the embodiment of the present application, after obtaining the log data of each service type, the log data of each type needs to be stored to implement further processing, so that step Sb may further include: storing the log data of each service type to a distributed file system according to the service type; or storing the log data of each service type to the distributed file system according to the sub-service type partitions.
Wherein different partitions store log data for different sub-service types.
Specifically, the data processed by the Spark-streaming program is put into a specific storage directory: a user (user) under initialization (init) under a logical analysis (analysis) bucket, an consume _ specific case (depend _ detail), an event _ specific case (event _ detail), and the like. analyze _ init and continue classification according to the platform.
Of course, in order to obtain more refined log data, the log data of each service type may be classified according to sub-service types, and stored in the distributed file system in a partitioned manner. In the embodiment of the present application, the sub-service types may be obtained by further finely dividing the service types.
Specifically, the logs of each service type may be classified and stored in a partitioned manner according to the difference of items and time.
For example, the user class log data partition is stored in the following directory:
bucket list/analyze/init/user/{ service partition }/{ date };
wherein, the service partition can be any defined character.
In the embodiment, the log data of each service type is stored in the distributed file system according to the service type; or storing the log data of each service type into the distributed file system according to the sub-service type partition, so that the log data of each service type is respectively loaded into the corresponding logic table, and the method comprises the following steps: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
Specifically, the data directory generated by the file system every day can be loaded into the corresponding logic table according to the data partition by the Hadoop-hive configured in the scheduling job process.
In the above embodiment, the log data of each service type may be loaded into the corresponding logic table, or the log data stored in each partition may be loaded into the corresponding logic table, where the log data stored in each partition is the log data of a sub-type of each service type, and therefore step S102 and step S103 may specifically include: step S1021 (not shown in the figure) and step S1031 (not shown in the figure); alternatively, step S1022 (not shown in the figure) and step S1032 (not shown in the figure), wherein,
and S1021, acquiring a logic table corresponding to the service type.
And step S1031, generating corresponding service indexes based on the logic table corresponding to the service types.
And the logic table corresponding to the service type comprises log data of the service type.
Step S1022, determine a sub-service type in the service types of the service index to be generated, and obtain a logic table corresponding to the sub-service type.
Step S1032 generates a corresponding service index based on the logic table corresponding to the sub-service type.
And the logic table corresponding to the sub-service type comprises the log data of the sub-service type.
For the embodiment of the present application, if the service index requested to be generated in the service index generation request is only a service index of a certain service type, for example, a service index of a user class, or the distributed storage file is only stored according to a service type and is not partitioned according to a subtype, step S1021 and step S1031 may be executed to obtain a service index; if the service index requested to be generated in the service index generation request is a service index of a sub-type under a certain service type, for example, a service index of a certain date (e.g., 2019.09.17) under a user class, or the distributed storage file is only partitioned according to sub-service types, step S1022 and step S1032 may be executed. But is not limited to the above.
For the embodiment of the present application, in the above embodiment, processing log data and generating a corresponding service index are performed in a logic layer of a data structure, and after the service index is generated, the generated service index may be displayed in a preset format in a display layer.
For example, the user index, the consumption index, the event index and the like are displayed in the display layer through an excel format.
The foregoing embodiments describe the data processing method from the perspective of a method flow, and the following embodiments describe the data processing apparatus from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments:
an embodiment of the present application provides a data processing apparatus, and as shown in fig. 2, the data processing apparatus 20 may include: a determination module 21, a first acquisition module 22, and a generation module 23, wherein,
the determining module 21 is configured to determine a service type of a service indicator to be generated when a service indicator generation request is received.
The first obtaining module 22 is configured to obtain log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
And the generating module 23 is configured to generate a corresponding service index based on the log data of the service type.
In a possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second obtaining module, a processing module, and a loading module, wherein,
and the second acquisition module is used for acquiring the log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
The first obtaining module 21 and the second obtaining module may be the same obtaining module or different obtaining modules. The embodiments of the present application are not limited.
The processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the application, when the processing module performs preset processing on the acquired log data to obtain the log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data and obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification; when the processing module decodes the intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type to the distributed file system according to the service type, or storing the log data of each service type to the distributed file system in a partition mode according to the sub-service type, and storing the log data of different sub-service types in different partitions.
In another possible implementation manner of the embodiment of the present application, when the loading module loads log data of each service type into a corresponding logic table, the loading module is specifically configured to: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: acquiring a logic table corresponding to the service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner of the embodiment of the application, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
For the embodiment of the present application, the first storage module and the second storage module may be the same storage module or different storage modules, and are not limited in the embodiment of the present application.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the data processing device determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, classifies the log data collected by the log collection server according to different service types to obtain each type of log data, and then generates the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The data processing apparatus of this embodiment can execute the data processing method provided in this embodiment, and the implementation principles thereof are similar and will not be described herein again.
The above embodiments describe a data processing method from the perspective of a method flow and a data processing apparatus from the perspective of a virtual module or a virtual unit, and the following embodiments describe an electronic device, which may include: the cloud device, the local server, and the terminal device may be configured to execute operations corresponding to the data processing method in the foregoing method embodiment, which is described in detail in the following embodiments:
an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.
The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 3002 may include a path that conveys information between the aforementioned components. The bus 3002 may be a PCI bus or an EISA bus, etc. The bus 3002 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Memory 3003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: according to the method and the device, when a service index generation request is received, the service type of the service index to be generated is determined, then log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by a log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device for generating the service indexes have the advantages that when a service index generation request is received, the service type of the service index to be generated is determined, then the log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by the log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (13)

1. A data processing method, comprising:
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring log data of the service types, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types;
and generating a corresponding service index based on the log data of the service type.
2. The method of claim 1, wherein the determining the traffic type of the traffic indicator to be generated further comprises:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on a detected request in a preset format;
presetting the acquired log data to obtain the log data of each service type;
and respectively loading the log data of each service type into the corresponding logic table.
3. The method according to claim 2, wherein the performing the preset processing on the acquired log data to obtain the log data corresponding to each service type includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
4. The method of claim 3, wherein the decoding the intercepted log data further comprises:
carrying out decryption processing on the log data containing the encrypted identification in each intercepted segment of log data;
wherein, the decoding process of each section of intercepted log data includes:
and decoding each section of log data after decryption.
5. The method of claim 3, wherein decoding any piece of intercepted log data further comprises:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
6. The method according to any one of claims 2 to 5, wherein the preset processing is performed on the acquired log data to obtain log data of each service type, and then any one of the following is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
7. The method according to claim 6, wherein the loading the log data of each service type into the corresponding logical table respectively comprises any one of:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables;
and respectively loading the log data stored in each partition into the corresponding logic table.
8. The method of claim 7, wherein the obtaining log data of the traffic type; generating a corresponding service index based on the log data of the service type, wherein the service index comprises any one of the following items:
acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
determining a sub-service type in the service types of the service indexes to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
9. The method of claim 2, wherein obtaining log data from a log collection server comprises:
when the log data in the log collection server are monitored to be changed, the changed log data are obtained through a log collection system flash and are uploaded to a message queue kafka;
and pulling the changed log data from the kafka through Spark-streaming.
10. The method of claim 9, wherein the obtaining log data that changes by the log collection system flash further comprises:
storing the changed log data to a distributed file system in real time through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through the distributed file system.
11. A data processing apparatus, comprising:
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, and the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types;
and the generating module is used for generating a corresponding service index based on the log data of the service type.
12. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the data processing method according to any one of claims 1 to 10.
13. A computer readable storage medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a data processing method according to any one of claims 1 to 10.
CN201910990260.0A 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium Active CN111796993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910990260.0A CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910990260.0A CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111796993A true CN111796993A (en) 2020-10-20
CN111796993B CN111796993B (en) 2023-03-17

Family

ID=72805609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910990260.0A Active CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111796993B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094342A (en) * 2021-04-02 2021-07-09 上海中通吉网络技术有限公司 Data persistence method, device and equipment and storage medium
CN113568967A (en) * 2021-07-29 2021-10-29 掌阅科技股份有限公司 Dynamic extraction method of time sequence index data, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN106790572A (en) * 2016-12-27 2017-05-31 广州华多网络科技有限公司 The system and method that a kind of distributed information log is collected
US20180074852A1 (en) * 2016-09-14 2018-03-15 Salesforce.Com, Inc. Compact Task Deployment for Stream Processing Systems
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
US20180074852A1 (en) * 2016-09-14 2018-03-15 Salesforce.Com, Inc. Compact Task Deployment for Stream Processing Systems
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN106790572A (en) * 2016-12-27 2017-05-31 广州华多网络科技有限公司 The system and method that a kind of distributed information log is collected
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094342A (en) * 2021-04-02 2021-07-09 上海中通吉网络技术有限公司 Data persistence method, device and equipment and storage medium
CN113568967A (en) * 2021-07-29 2021-10-29 掌阅科技股份有限公司 Dynamic extraction method of time sequence index data, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111796993B (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN107577805B (en) Business service system for log big data analysis
US11836533B2 (en) Automated reconfiguration of real time data stream processing
Barika et al. Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions
CN109074377B (en) Managed function execution for real-time processing of data streams
EP3342137B1 (en) Edge intelligence platform, and internet of things sensor streams system
US9946593B2 (en) Recovery strategy for a stream processing system
US9965330B2 (en) Maintaining throughput of a stream processing framework while increasing processing load
CN108039959B (en) Data situation perception method, system and related device
US20190155646A1 (en) Providing strong ordering in multi-stage streamng processing
CN104537076B (en) A kind of file read/write method and device
US20170083380A1 (en) Managing resource allocation in a stream processing framework
CN109831478A (en) Rule-based and model distributed processing intelligent decision system and method in real time
CN109446274B (en) Method and device for managing BI metadata of big data platform
Poojara et al. Serverless data pipeline approaches for IoT data in fog and cloud computing
CN111177237B (en) Data processing system, method and device
CN111431926A (en) Data association analysis method, system, equipment and readable storage medium
CN111796993B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN114265680A (en) Mass data processing method and device, electronic equipment and storage medium
CN110557291A (en) Network service monitoring system
Akanbi Estemd: A distributed processing framework for environmental monitoring based on apache kafka streaming engine
CN115964392A (en) Real-time monitoring method, device and equipment based on flink and readable storage medium
CN114401239A (en) Metadata transmission method and device, computer equipment and storage medium
US9912545B2 (en) High performance topology resolution for non-instrumented nodes
Pourmajidi et al. Dogfooding: Using ibm cloud services to monitor ibm cloud infrastructure
CN110019045B (en) Log floor method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant