CN111796993B - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents

Data processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111796993B
CN111796993B CN201910990260.0A CN201910990260A CN111796993B CN 111796993 B CN111796993 B CN 111796993B CN 201910990260 A CN201910990260 A CN 201910990260A CN 111796993 B CN111796993 B CN 111796993B
Authority
CN
China
Prior art keywords
log data
service type
service
data
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910990260.0A
Other languages
Chinese (zh)
Other versions
CN111796993A (en
Inventor
陈必成
林顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yaji Software Co Ltd
Original Assignee
Xiamen Yaji Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yaji Software Co Ltd filed Critical Xiamen Yaji Software Co Ltd
Priority to CN201910990260.0A priority Critical patent/CN111796993B/en
Publication of CN111796993A publication Critical patent/CN111796993A/en
Application granted granted Critical
Publication of CN111796993B publication Critical patent/CN111796993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a data processing method and device, electronic equipment and a computer readable storage medium, and relates to the field of big data processing. The method comprises the following steps: when a service index generation request is received, determining the service type of a service index to be generated, then obtaining the log data of the service type, wherein the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types, and then generating the corresponding service index based on the log data of the service type. According to the embodiment of the application, the cost and the complexity of data processing are reduced, the time for obtaining the index data can be reduced, and the user experience is further improved.

Description

Data processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, the field of big data processing is also developed, and in order to record information such as daily operation behaviors of a user, a system generates and stores an operation log of the user. The follow-up system can also analyze the operation log of the user to determine various index data.
Currently, the method of analyzing the operation log of the user to determine various index data generally includes: and when each type of index data is determined, traversing all the stored logs, determining the logs related to the type of index and analyzing to determine each type of index data. However, because the data amount of the operation logs of the user is huge, all the logs need to be traversed and analyzed when each type of index data is determined, the data processing cost is high, the complexity is high, the time for obtaining the index data is long, and the user experience is poor.
Disclosure of Invention
The application provides a data processing method, a data processing device, an electronic device and a computer readable storage medium, which can solve at least one technical problem. The technical scheme is as follows:
in a first aspect, a data processing method is provided, where the method includes:
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring log data of service types, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types;
and generating a corresponding service index based on the log data of the service type.
In a possible implementation manner, determining a service type of a service indicator to be generated further includes:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on the detected request in the preset format;
presetting the acquired log data to obtain the log data of each service type;
and respectively loading the log data of each service type into the corresponding logic table.
In another possible implementation manner, the pre-setting processing is performed on the acquired log data to obtain log data corresponding to each service type, and the method includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, decoding the intercepted log data, before further comprising:
carrying out decryption processing on the log data containing the encrypted identification in each intercepted segment of log data;
decoding each section of intercepted log data, wherein the decoding processing comprises the following steps:
and decoding each section of log data after decryption.
In another possible implementation manner, decoding any one piece of intercepted log data, before further including:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
In another possible implementation manner, the acquired log data is subjected to preset processing to obtain log data of each service type, and then any one of the following items is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
In another possible implementation manner, the log data of each service type is loaded into a readable file respectively, and the method includes any one of the following steps:
respectively loading the log data of each service type stored in the distributed file system into the corresponding logic table;
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, log data of the service type is obtained; generating a corresponding service index based on the log data of the service type, wherein the service index comprises any one of the following items:
acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
determining a sub-service type in the service type of the service index to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, the obtaining of the log data from the log collection server includes:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the obtaining, by the log collection system flash, the log data that has changed further includes:
storing the changed log data to a distributed file system through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
In a second aspect, there is provided a data processing apparatus comprising:
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, and the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types;
and the generating module is used for generating a corresponding service index based on the log data of the service type.
In one possible implementation, the apparatus further includes: a second obtaining module, a processing module, and a loading module, wherein,
the second acquisition module is used for acquiring log data from the log collection server, and the log data is generated by the log collection server based on the detected request in the preset format;
the processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner, when the processing module performs preset processing on the acquired log data to obtain log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification;
when the processing module decodes each segment of intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner, the apparatus further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type into the distributed file system according to the service type, or storing the log data of each service type into the distributed file system according to the sub-service type in a partitioning mode, and storing the log data of different sub-service types in different partitioning modes.
In another possible implementation manner, when the loading module loads the log data of each service type into the logic table, the loading module is specifically configured to:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; alternatively, the first and second liquid crystal display panels may be,
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: acquiring a logic table corresponding to the service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to change, the changed log data are obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the apparatus further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the corresponding operation of the data processing method according to the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a data processing method as shown in the first aspect or any possible implementation manner of the first aspect.
The beneficial effect that technical scheme that this application provided brought is:
compared with the prior art that all log data need to be traversed when each type of index data is determined, the method and the device determine the service type of the service index to be generated when a service index generation request is received, then obtain the log data of the service type, classify the log data collected by the log collection server according to different service types to obtain each type of log data, and then generate the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device for data processing according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
flume: the system is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting mass logs, and the Flume supports various data senders customized in the log system and used for collecting data; at the same time, flash provides the ability to simply process data and write to various data recipients (customizable).
Kafka: is an open source stream processing platform and is written by Scala and Java. Kafka is a high throughput, distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. These data are typically addressed by handling logs and log aggregations due to throughput requirements. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a data processing method, which may be executed by an electronic device, where the electronic device may include: as shown in fig. 1, the method includes:
step S101, when a service index generation request is received, determining the service type of a service index to be generated.
For the embodiment of the present application, the service type that can determine the service index to be generated based on the service index generation request may include one service type, and may also include at least two service types. The embodiments of the present application are not limited thereto.
For example, the service type of the service indicator to be generated may include: a user class index, a consumption class index, and an event class index.
And step S102, acquiring the log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
For the embodiment of the application, the log data collected by the log collection server can be classified in advance according to different service types. The specific implementation manner for classifying the log data collected by the log collection server according to different service types in the embodiment of the present application is described in detail in the following embodiments, and is not described herein again.
For example, log data collected by the log collection server is classified in advance according to a user class, a consumption class, and an event class, and log data related to a user operation, log data related to consumption, and log data related to an event are obtained.
The log data collected by the log collection server in the foregoing embodiment is classified according to event types in advance, and specifically, the log data collected by the log collection server may be classified according to a user-defined event in advance, or may be classified according to a preset event, which is not limited in this embodiment. For example, the customized event or preset event may include: any event such as a boot event, an application exit event, etc.
For the embodiment of the application, after the log data received by the log collection server is classified according to different service types, the log data related to the service type can be directly obtained to obtain the corresponding service index.
And step S103, generating a corresponding service index based on the log data of the service type.
A specific example based on step S101, step S102, and step S103 is: when a service index generation request is received, determining that the service type of a service index to be generated is a user type, then acquiring log data related to the user, and then generating a user type index based on the acquired log data related to the user.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the method for processing the data determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, classifies the log data collected by the log collection server according to different service types to obtain each type of log data, and then generates the corresponding service index based on the log data of the service type. In other words, in the embodiment of the present application, collected log data are classified in advance according to different service types, and when a certain service index is generated, only the log data of the service index type needs to be acquired from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for acquiring the index data can be reduced, and the user experience can be further improved.
In a possible implementation manner of the embodiment of the present application, before the step S101, the method may further include: step Sa (not shown), step Sb (not shown), and step Sc (not shown), wherein,
and step Sa, acquiring log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
For the embodiment of the present application, before acquiring log data from the log collection server, the method may further include: the log collecting server (Nginx) generates a log based on a request in a preset format when detecting the request in the preset format, and stores the generated log to a log server (log server).
For example, when the format is detected as https:// logstorage. Cocos. Com/log/v1? When requested by "+ [ url coded json string ], a log is generated based on the request data, and the generated log is stored in a log server, and more specifically, the generated log may be stored in a file of/var/log/nginx/metrics-access.
From the above, it can be seen that: the log collection server may generate log data, and thus, when a change in the log data in the log collection server is detected, for example, when new log data is added, the log data is acquired from the log collection server.
Specifically, the acquiring of the log data from the log collection server includes: when the log data in the log collection server is monitored to change, the changed log data are obtained through a log collection system flash and uploaded to a message queue; changed log data is pulled from kafka by Spark-streaming.
The pulling, by Spark-streaming, the changed log data from the kafka may specifically include: the changed log data is pulled from kafka through Spark-streaming call kafka Application Programming Interface (API).
For the embodiment of the present application, obtaining the changed log data through the flume and the kafka, and then: storing the changed log data to a distributed file system through the flash and the kafka; and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
Specifically, in order to avoid the problem that log data is not found in the subsequent process of processing the log data, a dump policy may be configured in the flash and kafka in advance to implement backup of the original log data, that is, the acquired log data is dumped to the distributed file system at specific time intervals. Further, the acquired log data can be transferred to a specific directory under the distributed file system, and the data under the directory is used for data backtracking and problem location later.
For example, the acquired log data is transferred to the distributed file system every 60 seconds.
For the embodiment of the application, the obtained log data is transferred to the distributed file system at certain intervals, so that the log data stored in the distributed file system is cleaned at preset intervals in order to avoid occupying a large storage space of the distributed file system.
For example, log data stored in the last 60 days is cleaned up every 60 days.
Of course, obtaining the log data from the log collection server may further include: when the log data in the log collection server is monitored to be changed, the changed log data is obtained through dis; the changed log data is pulled from the dis through a Spark-streaming calling dis Application Programming Interface (API).
Further, when log data that has changed is obtained by dis, the log data may also be transferred to a loc/rowlog file in a specific directory.
The processing method of dis for log data is similar to that of the jump and kafka for logs, and as described above, the description is omitted here.
The embodiment of the present application is not limited to the above-mentioned processing of the log by the flash and kafka, and the processing of the log by the dis
And Sb, carrying out preset processing on the acquired log data to obtain the log data of each service type.
In the above embodiment, after the log data is acquired from the log collection server, the log data needs to be analyzed and decoded to obtain the log data of each service type. Specifically, the acquired log data is analyzed and decoded through Spark-streaming, so that log data of each service type is obtained.
The specific way of performing the preset processing on the acquired log data is described in detail in the following embodiments, and is not described herein again.
And step Sc, loading the log data of each service type into a corresponding logic table respectively.
For the embodiment of the present application, after the log data of each type are obtained, the log data of each type may be loaded into the corresponding logic table, so that the subsequent analysis processing device may obtain the latest log data. The logic table in the embodiment of the present application may include: hive table.
For example, each type of log data may be loaded into a corresponding logical table through a job script DLI (or Hadoop-hive) configured in the scheduling service DLF (or a scheduling job process). In the embodiment of the application, a Data Lake Factory (Data Lake Factory) provides a one-stop big Data collaborative development platform, a user can easily complete multiple tasks such as Data modeling, data integration, script development, job scheduling and operation and maintenance monitoring, the threshold of using big Data by the user is greatly reduced, and the user is helped to quickly construct a big Data processing center. Further, the embodiment of the present application is not limited to invoking DLF to load log data into a corresponding logic table, and any manner that can load log data into a logic table is within the protection scope of the embodiment of the present application.
Wherein, hive in Hadoop-Hive is a data warehouse infrastructure established on Hadoop. It provides a set of tools that can be used to perform data Extraction Transformation Loading (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop.
In another possible implementation manner of the embodiment of the present application, step Sb specifically may include: step Sb1 (not shown), step Sb2 (not shown), step Sb3 (not shown), and step Sb4 (not shown), wherein,
and step Sb1, intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data.
Specifically, the acquired log data can be intercepted by taking \ t as an identifier.
For example, the log data obtained is XXXX \ t ×, and then two segments can be cut by using \ t as the identifier.
And Sb2, decoding each section of log data in the intercepted plurality of sections of log data.
For the embodiment of the present application, after the obtained log data is intercepted, the intercepted log data may be subjected to urldecode processing to obtain decoded data. The URL is a Uniform Resource Location (URL) encoding scheme.
Further, since there may be encrypted data in each intercepted log data, before step Sb2, the method may further include: and carrying out decryption processing on the log data containing the encrypted identification in each section of the intercepted log data.
Specifically, it is determined whether each piece of intercepted log data includes an encryption identifier, for example, V1 may represent that the piece of log data is encrypted log data, and V2 may represent that the piece of log data is unencrypted log data, and after determining the encrypted log data, the encrypted log data is decrypted, so that, after decrypting some pieces of intercepted log data, step Sb2 may specifically include: and decoding each section of log data after decryption.
The specific decoding method is detailed above and is not described again.
In the above embodiment, the step Sb is executed after the encryption/decryption judgment and the decryption processing are performed on each piece of intercepted data, but it is also possible to perform the encryption/decryption judgment once, and if the piece of intercepted data is encrypted data, the decryption processing is performed on the piece of intercepted data, then the decoding processing is directly performed on the piece of intercepted log data, and then the encryption/decryption judgment is performed on any piece of intercepted log data, that is, the decoding processing is performed on any piece of intercepted log data, and before the step Sb, the step Sb further includes: and if any section of log data contains the encryption identifier, decrypting any section of log data.
And Sb3, carrying out format conversion processing on each section of log data after decoding processing.
For the embodiment of the application, after each piece of data is decoded, each piece of decoded data can be converted into a jason format. Wherein, jason is short for JavaScript Object Notation, and is a lightweight data representation method. The jason format records data in a key-value mode.
Specifically, each piece of decoded log data is converted into a jason format through fastJason.
And step Sb4, performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
Another possible implementation manner of the embodiment of the present application, after obtaining the log data of each service type, the log data of each type needs to be stored to implement further processing, so that step Sb may further include: storing the log data of each service type to a distributed file system according to the service type; or storing the log data of each service type into the distributed file system according to the sub service type partitions.
Wherein different partitions store log data for different sub-service types.
Specifically, the data processed by the Spark-streaming program is put into a specific storage directory: a user (user) under initialization (init) under a logical analysis (analytics) bucket, an consumed _ specific case (extended _ detail), an event _ specific case (event _ detail), and the like. analyze _ init and continue classification according to the platform.
Certainly, in order to obtain more refined log data, the log data of each service type may be classified according to sub-service types and stored in the distributed file system in a partitioned manner. In the embodiment of the present application, the sub-service types may be obtained by further finely dividing the service types.
Specifically, the logs of each service type may be classified and stored in a partitioned manner according to the difference of items and time.
For example, the user class log data partition is stored in the following directory:
bucket list/analyze/init/user/{ service partition }/{ date };
wherein, the service partition can be any defined character.
In the embodiment, the log data of each service type is stored in the distributed file system according to the service type; or storing the log data of each service type into the distributed file system according to the sub-service type partition, so that the log data of each service type is respectively loaded into the corresponding logic table, and the method comprises the following steps: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
Specifically, data of a data directory generated by the file system every day can be loaded into the corresponding logic table according to the data partition by the Hadoop-hive configured in the job scheduling process.
In the above embodiment, the log data of each service type may be loaded into the corresponding logical table, and the log data stored in each partition may also be loaded into the corresponding logical table, where the log data stored in each partition is the log data of a subtype of each service type, and therefore step S102 and step S103 may specifically include: step S1021 (not shown in the figure) and step S1031 (not shown in the figure); alternatively, step S1022 (not shown in the figure) and step S1032 (not shown in the figure), wherein,
and S1021, acquiring a logic table corresponding to the service type.
And step S1031, generating corresponding service indexes based on the logic table corresponding to the service types.
And the logic table corresponding to the service type comprises log data of the service type.
Step S1022, determine a sub-service type in the service types of the service index to be generated, and obtain a logic table corresponding to the sub-service type.
Step S1032 generates a corresponding service index based on the logic table corresponding to the sub-service type.
And the logic table corresponding to the sub-service type comprises the log data of the sub-service type.
For the embodiment of the present application, if the service index requested to be generated in the service index generation request is only a service index of a certain service type, for example, a service index of a user class, or the distributed storage file is only stored according to a service type and is not partitioned according to a subtype, step S1021 and step S1031 may be executed to obtain a service index; if the service index requested to be generated in the service index generation request is a service index of a sub-type under a certain service type, for example, a service index of a certain date (for example, 2019.09.17) under a user type, or the distributed storage file is only partitioned according to a sub-service type, step S1022 and step S1032 may be executed. But is not limited to the above.
For the embodiment of the present application, in the above embodiment, processing log data and generating a corresponding service index are performed in a logic layer of a data structure, and after the service index is generated, the generated service index may be displayed in a preset format in a display layer.
For example, the user index, the consumption index, the event index and the like are displayed in the display layer through an excel format.
The foregoing embodiments describe the data processing method from the perspective of a method flow, and the following embodiments describe the data processing apparatus from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments:
an embodiment of the present application provides a data processing apparatus, and as shown in fig. 2, the data processing apparatus 20 may include: a determination module 21, a first acquisition module 22, and a generation module 23, wherein,
the determining module 21 is configured to determine a service type of a service indicator to be generated when a service indicator generation request is received.
The first obtaining module 22 is configured to obtain log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
And the generating module 23 is configured to generate a corresponding service index based on the log data of the service type.
In a possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second obtaining module, a processing module and a loading module, wherein,
and the second acquisition module is used for acquiring the log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
The first obtaining module 21 and the second obtaining module may be the same obtaining module or different obtaining modules. The embodiments of the present application are not limited.
The processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the present application, when the processing module performs preset processing on the acquired log data to obtain the log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data and obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification; when the processing module decodes the intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type into the distributed file system according to the service type, or storing the log data of each service type into the distributed file system according to the sub-service type in a partitioning mode, and storing the log data of different sub-service types in different partitioning modes.
In another possible implementation manner of the embodiment of the present application, when the loading module loads log data of each service type into a corresponding logic table, the loading module is specifically configured to: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: acquiring a logic table corresponding to the service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner of the embodiment of the present application, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
For the embodiment of the present application, the first storage module and the second storage module may be the same storage module or different storage modules, and are not limited in the embodiment of the present application.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the data processing device determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, classifies the log data collected by the log collection server according to different service types to obtain each type of log data, and then generates the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The data processing apparatus of this embodiment can execute the data processing method provided in this embodiment, and the implementation principles thereof are similar and will not be described herein again.
The foregoing embodiment describes a data processing method from the perspective of a method flow and a data processing apparatus from the perspective of a virtual module or a virtual unit, and the following embodiment describes an electronic device, which may include: the cloud device, the local server, and the terminal device may be configured to execute operations corresponding to the data processing method in the foregoing method embodiment, which is described in detail in the following embodiments:
an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. Wherein the processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.
Processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 3002 may include a path that conveys information between the aforementioned components. The bus 3002 may be a PCI bus or EISA bus, etc. The bus 3002 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Memory 3003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: according to the method and the device, when a service index generation request is received, the service type of the service index to be generated is determined, then log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by a log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The embodiment of the present application provides a computer readable storage medium, on which a computer program is stored, and when the computer program runs on a computer, the computer is enabled to execute the corresponding content in the foregoing method embodiment. Compared with the prior art, the method and the device for generating the service indexes have the advantages that when a service index generation request is received, the service type of the service index to be generated is determined, then the log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by the log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, the log data of the service index type are only required to be obtained from the log data classified in advance, and all log data are not required to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

1. A data processing method, comprising:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on the detected request in the preset format;
presetting the acquired log data to obtain the log data of each service type;
loading the log data of each service type into a corresponding logic table respectively;
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring the log data of the service types, wherein the log data of each service type is obtained by intercepting, decoding and format conversion processing the log data collected by the log collection server and performing data mapping on each section of log data after format conversion according to different service types;
generating a corresponding service index based on the log data of the service type;
wherein, the log data of the service type is obtained; generating a corresponding service index based on the log data of the service type, wherein the generating comprises the following steps:
if the service index to be generated is a service index of a service type, acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
if the service index to be generated is a sub-type service index under the service type, determining a sub-service type in the service type of the service index to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
2. The method according to claim 1, wherein the performing the preset processing on the acquired log data to obtain the log data corresponding to each service type includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
3. The method of claim 2, wherein decoding the truncated log data segments further comprises:
carrying out decryption processing on the log data containing the encrypted identifier in each intercepted log data segment;
wherein, the decoding process of each section of intercepted log data includes:
and decoding each section of log data after decryption.
4. The method of claim 3, wherein decoding any piece of intercepted log data further comprises:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
5. The method according to any one of claims 1 to 4, wherein the preset processing is performed on the acquired log data to obtain log data of each service type, and then any one of the following is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
6. The method according to claim 5, wherein the loading the log data of each service type into the corresponding logical table respectively comprises any one of:
respectively loading the log data of each service type stored in the distributed file system into the corresponding logic table;
and respectively loading the log data stored in each partition into the corresponding logic table.
7. The method of claim 1, wherein the obtaining log data from a log collection server comprises:
when the log data in the log collection server are monitored to be changed, the changed log data are obtained through a log collection system flash and are uploaded to a message queue kafka;
and pulling the changed log data from the kafka through Spark-streaming.
8. The method of claim 7, wherein the obtaining of the changed log data by the log collection system flash further comprises:
storing the changed log data to a distributed file system in real time through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through the distributed file system.
9. A data processing apparatus, comprising:
the second acquisition module is used for acquiring log data from the log collection server, and the log data is generated by the log collection server based on the detected request in the preset format;
the processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
the loading module is used for loading the log data of each service type into the corresponding logic table respectively;
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, wherein the log data of each service type is obtained by intercepting, decoding and format conversion processing the log data collected by the log collection server and carrying out data mapping on each section of log data after format conversion according to different service types;
the generating module is used for generating a corresponding service index based on the log data of the service type;
if the service index to be generated is a service index of a service type, the first obtaining module is specifically configured to: acquiring a logic table corresponding to the service type; the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
if the service index to be generated is a service index of a sub-type under the service type, the first obtaining module is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type; the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises the log data of the sub-service type.
10. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the data processing method according to any one of claims 1 to 8.
11. A computer readable storage medium, characterized in that it stores at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by a processor to implement a data processing method according to any one of claims 1 to 8.
CN201910990260.0A 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium Active CN111796993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910990260.0A CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910990260.0A CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111796993A CN111796993A (en) 2020-10-20
CN111796993B true CN111796993B (en) 2023-03-17

Family

ID=72805609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910990260.0A Active CN111796993B (en) 2019-10-17 2019-10-17 Data processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111796993B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094342A (en) * 2021-04-02 2021-07-09 上海中通吉网络技术有限公司 Data persistence method, device and equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN106790572A (en) * 2016-12-27 2017-05-31 广州华多网络科技有限公司 The system and method that a kind of distributed information log is collected
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275278B2 (en) * 2016-09-14 2019-04-30 Salesforce.Com, Inc. Stream processing task deployment using precompiled libraries

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608203A (en) * 2015-12-24 2016-05-25 Tcl集团股份有限公司 Internet of things log processing method and device based on Hadoop platform
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN106790572A (en) * 2016-12-27 2017-05-31 广州华多网络科技有限公司 The system and method that a kind of distributed information log is collected
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm

Also Published As

Publication number Publication date
CN111796993A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN107577805B (en) Business service system for log big data analysis
US11086688B2 (en) Managing resource allocation in a stream processing framework
US20230004434A1 (en) Automated reconfiguration of real time data stream processing
CN109074377B (en) Managed function execution for real-time processing of data streams
US9946593B2 (en) Recovery strategy for a stream processing system
EP3342137B1 (en) Edge intelligence platform, and internet of things sensor streams system
US9965330B2 (en) Maintaining throughput of a stream processing framework while increasing processing load
US11755452B2 (en) Log data collection method based on log data generated by container in application container environment, log data collection device, storage medium, and log data collection system
US9418085B1 (en) Automatic table schema generation
CN109446274B (en) Method and device for managing BI metadata of big data platform
KR101656360B1 (en) Cloud System for supporting auto-scaled Hadoop Distributed Parallel Processing System
Freire et al. Survey on the run‐time systems of enterprise application integration platforms focusing on performance
CN111431926A (en) Data association analysis method, system, equipment and readable storage medium
CN111177237B (en) Data processing system, method and device
CN115237857A (en) Log processing method and device, computer equipment and storage medium
Akanbi Estemd: A distributed processing framework for environmental monitoring based on apache kafka streaming engine
CN111796993B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN108595480B (en) Big data ETL tool system based on cloud computing and application method
Ikhlaq et al. Computation of Big Data in Hadoop and Cloud Environment
US9912545B2 (en) High performance topology resolution for non-instrumented nodes
CN110019045B (en) Log floor method and device
CN115514618A (en) Alarm event processing method and device, electronic equipment and medium
US8495033B2 (en) Data processing
Plaza-Martín et al. Analyzing network log files using big data techniques
CN117435367B (en) User behavior processing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant