CN111796993A - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents
Data processing method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN111796993A CN111796993A CN201910990260.0A CN201910990260A CN111796993A CN 111796993 A CN111796993 A CN 111796993A CN 201910990260 A CN201910990260 A CN 201910990260A CN 111796993 A CN111796993 A CN 111796993A
- Authority
- CN
- China
- Prior art keywords
- log data
- service
- service type
- data
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 73
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000005192 partition Methods 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000004140 cleaning Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 238000013506 data mapping Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3438—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the application provides a data processing method and device, electronic equipment and a computer readable storage medium, and relates to the field of big data processing. The method comprises the following steps: when a service index generation request is received, determining the service type of a service index to be generated, then acquiring log data of the service type, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types, and then generating a corresponding service index based on the log data of the service type. According to the embodiment of the application, the cost and the complexity of data processing are reduced, the time for obtaining the index data can be reduced, and the user experience is further improved.
Description
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, the field of big data processing is also developed, and in order to record information such as daily operation behaviors of a user, a system generates and stores an operation log of the user. The follow-up system can also analyze the operation log of the user to determine various index data.
Currently, the method of analyzing the operation log of the user to determine various index data generally includes: and when each type of index data is determined, traversing all the stored logs, determining the logs related to the type of index and analyzing to determine each type of index data. However, because the data volume of the stored operation logs of the user is huge, all the logs need to be traversed and analyzed when determining each type of index data, the data processing cost is high, the complexity is high, the time for obtaining the index data is long, and the user experience is poor.
Disclosure of Invention
The application provides a data processing method, a data processing device, an electronic device and a computer readable storage medium, which can solve at least one technical problem. The technical scheme is as follows:
in a first aspect, a data processing method is provided, and the method includes:
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring log data of service types, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types;
and generating a corresponding service index based on the log data of the service type.
In a possible implementation manner, determining a service type of a service indicator to be generated further includes:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on the detected request in the preset format;
presetting the acquired log data to obtain the log data of each service type;
and respectively loading the log data of each service type into the corresponding logic table.
In another possible implementation manner, the pre-setting processing is performed on the acquired log data to obtain log data corresponding to each service type, and the method includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, decoding the intercepted log data, before further comprising:
carrying out decryption processing on the log data containing the encrypted identification in each intercepted segment of log data;
decoding each section of intercepted log data, wherein the decoding processing comprises the following steps:
and decoding each section of log data after decryption.
In another possible implementation manner, decoding any one piece of intercepted log data, before further comprising:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
In another possible implementation manner, the acquired log data is subjected to preset processing to obtain log data of each service type, and then any one of the following items is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
In another possible implementation manner, the log data of each service type is loaded into a readable file respectively, and the method includes any one of the following steps:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables;
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, log data of the service type is acquired; generating a corresponding service index based on the log data of the service type, wherein the service index comprises any one of the following items:
acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
determining a sub-service type in the service type of the service index to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, the obtaining of the log data from the log collection server includes:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the obtaining, by the log collection system flash, the log data that has changed further includes:
storing the changed log data to a distributed file system through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
In a second aspect, there is provided a data processing apparatus comprising:
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, and the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types;
and the generating module is used for generating a corresponding service index based on the log data of the service type.
In one possible implementation, the apparatus further includes: a second obtaining module, a processing module, and a loading module, wherein,
the second acquisition module is used for acquiring log data from the log collection server, and the log data is generated by the log collection server based on the detected request in the preset format;
the processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner, when the processing module performs preset processing on the acquired log data to obtain log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification;
when the processing module decodes the intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner, the apparatus further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type to the distributed file system according to the service type, or storing the log data of each service type to the distributed file system in a partition mode according to the sub-service type, and storing the log data of different sub-service types in different partitions.
In another possible implementation manner, when the loading module loads the log data of each service type into the logic table, the loading module is specifically configured to:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or,
and respectively loading the log data stored in each partition into the corresponding logic table.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: acquiring a logic table corresponding to the service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner, when acquiring the log data of the service type, the first acquiring module is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type;
the generating module is specifically configured to, when generating a corresponding service index based on the log data of the service type: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner, the apparatus further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the corresponding operation of the data processing method according to the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the data processing method according to the first aspect or any possible implementation manner of the first aspect.
The beneficial effect that technical scheme that this application provided brought is:
compared with the prior art that all log data need to be traversed when each type of index data is determined, the method and the device determine the service type of the service index to be generated when a service index generation request is received, then obtain the log data of the service type, classify the log data collected by the log collection server according to different service types to obtain each type of log data, and then generate the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device for data processing according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
flume: the system is a high-availability, high-reliability and distributed system for collecting, aggregating and transmitting mass logs, and the Flume supports various data senders customized in the log system and used for collecting data; at the same time, flash provides the ability to simply process data and write to various data recipients (customizable).
Kafka: is an open source stream processing platform and is written by Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. These data are typically addressed by handling logs and log aggregations due to throughput requirements. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a data processing method, which may be executed by an electronic device, where the electronic device may include: as shown in fig. 1, the method includes:
step S101, when a service index generation request is received, determining the service type of a service index to be generated.
For the embodiment of the present application, the service type that can determine the service index to be generated based on the service index generation request may include one service type, and may also include at least two service types. The embodiments of the present application are not limited.
For example, the service type of the service indicator to be generated may include: user class indicators, consumption class indicators, and event class indicators.
And step S102, acquiring the log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
For the embodiment of the application, the log data collected by the log collection server can be classified according to different service types in advance. The specific implementation manner for classifying the log data collected by the log collection server according to different service types in the embodiment of the present application is described in detail in the following embodiments, and is not described herein again.
For example, log data collected by the log collection server is classified in advance according to a user class, a consumption class, and an event class, and log data related to a user operation, log data related to consumption, and log data related to an event are obtained.
The log data collected by the log collection server in advance in the foregoing embodiment is classified according to event classes, and specifically, the log data collected by the log collection server may be classified according to a user-defined event in advance, or may be classified according to a preset event, which is not limited in this embodiment. For example, the custom event or the preset event may include: any event such as a boot event, an application exit event, etc.
For the embodiment of the application, after the log data received by the log collection server is classified according to different service types, the log data related to the service type can be directly obtained to obtain the corresponding service index.
And step S103, generating a corresponding service index based on the log data of the service type.
A specific example based on step S101, step S102, and step S103 is: when a service index generation request is received, determining that the service type of a service index to be generated is a user type, then acquiring log data related to the user, and then generating a user type index based on the acquired log data related to the user.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the data processing method determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, and classifies the log data collected by the log collection server according to different service types to obtain the log data of each type, and then generates the corresponding service index based on the log data of the service type. In other words, in the embodiment of the present application, collected log data are classified in advance according to different service types, and when a certain service index is generated, only the log data of the service index type needs to be acquired from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for acquiring the index data can be reduced, and the user experience can be further improved.
In a possible implementation manner of the embodiment of the present application, before the step S101, the method may further include: step Sa (not shown), step Sb (not shown), and step Sc (not shown), wherein,
and step Sa, acquiring log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
For the embodiment of the present application, before acquiring log data from the log collection server, the method may further include: the log collecting server (Nginx) generates a log based on a request in a preset format when detecting the request in the preset format, and stores the generated log to a log server (log server).
For example, when the format https:// logstorage. cos.com/log/v 1? When requested by "+ [ url coded json string ], a log is generated based on the request data, and the generated log is stored in a log server, and more specifically, the generated log may be stored in a file of/var/log/nginx/metrics-access.
From the above, it can be seen that: the log collection server may generate log data, and thus, when a change in the log data in the log collection server is detected, for example, when new log data is added, the log data is acquired from the log collection server.
Specifically, the acquiring of the log data from the log collection server includes: when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue; changed log data is pulled from kafka by Spark-streaming.
The pulling of the changed log data from the kafka through Spark-streaming may specifically include: changed log data is pulled from kafka through Spark-streaming calls to kafka Application Programming Interface (API).
For the embodiment of the present application, obtaining the changed log data through the flash and the kafka, and then: storing the changed log data to a distributed file system through the flash and the kafka; and performing data cleaning on the stored log data at preset time intervals through a distributed file system.
Specifically, in order to avoid the problem that log data is not found in the subsequent process of processing the log data, a dump policy may be configured in the flash and kafka in advance to implement backup of the original log data, that is, the acquired log data is dumped to the distributed file system at specific time intervals. Further, the acquired log data can be transferred to a specific directory under the distributed file system, and the data under the directory is used for data backtracking and problem location later.
For example, the acquired log data is transferred to the distributed file system every 60 seconds.
For the embodiment of the application, the obtained log data is transferred to the distributed file system at certain intervals, so that the log data stored in the distributed file system is cleaned at preset intervals in order to avoid occupying a large storage space of the distributed file system.
For example, log data stored in the last 60 days is cleaned up every 60 days.
Of course, obtaining the log data from the log collection server may further include: when the log data in the log collection server is monitored to be changed, the changed log data is obtained through dis; the changed log data is pulled from the dis through a Spark-streaming calling dis Application Programming Interface (API).
Further, when log data that has changed is obtained by dis, the log data may also be transferred to a loc/rowlog file in a specific directory.
The processing method of dis for log data is similar to that of the jump and kafka for logs, and as described above, the description is omitted here.
The embodiment of the present application is not limited to the above-mentioned processing of the log by the flash and kafka, and the processing of the log by the dis
And Sb, carrying out preset processing on the acquired log data to obtain the log data of each service type.
In the above embodiment, after the log data is acquired from the log collection server, the log data needs to be analyzed and decoded to obtain the log data of each service type. Specifically, the acquired log data is analyzed and decoded through Spark-streaming, so that log data of each service type is obtained.
The specific way of performing the preset processing on the acquired log data is described in detail in the following embodiments, and is not described herein again.
And step Sc, loading the log data of each service type into a corresponding logic table respectively.
For the embodiment of the present application, after the log data of each type is obtained, the log data of each type may be loaded into the corresponding logic table, so that the subsequent analysis processing device may obtain the latest log data. The logic table in the embodiment of the present application may include: hive table.
For example, each type of log data may be loaded into a corresponding logical table through a job script DLI (or Hadoop-live) configured in a scheduling service DLF (or a scheduling job process). In the embodiment of the application, a Data Lake Factory (Data Lake Factory) provides a one-stop big Data collaborative development platform, a user can easily complete multiple tasks such as Data modeling, Data integration, script development, job scheduling and operation and maintenance monitoring, the threshold of using big Data by the user is greatly reduced, and the user is helped to quickly construct a big Data processing center. Further, the embodiment of the present application is not limited to invoking DLF to load log data into a corresponding logic table, and any manner that can load log data into a logic table is within the protection scope of the embodiment of the present application.
Wherein, Hive in Hadoop-Hive is a data warehouse infrastructure established on Hadoop. It provides a set of tools that can be used to perform data Extraction Transformation Loading (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop.
In another possible implementation manner of the embodiment of the present application, step Sb specifically may include: step Sb1 (not shown), step Sb2 (not shown), step Sb3 (not shown), and step Sb4 (not shown), wherein,
and step Sb1, intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data.
Specifically, the acquired log data can be intercepted by taking \ t as an identifier.
For example, the log data obtained is XXXX \ t ×, and then two segments can be cut by using \ t as the identifier.
And step Sb2, decoding each piece of log data in the plurality of pieces of captured log data.
For the embodiment of the present application, after the obtained log data is intercepted, the intercepted log data may be subjected to urldecode processing to obtain decoded data. The urldecode is a Uniform Resource Location (URL) encoding scheme.
Further, since there may be encrypted data in each intercepted piece of log data, before step Sb2, the method may further include: and carrying out decryption processing on the log data containing the encrypted identification in each section of the intercepted log data.
Specifically, determining whether each piece of intercepted log data includes an encrypted identifier, for example, V1 may represent that the piece of log data is encrypted log data, V2 may represent that the piece of log data is unencrypted log data, and after determining the encrypted log data, decrypting the encrypted log data, so that, if some pieces of intercepted log data are decrypted, step Sb2 may specifically include: and decoding each section of log data after decryption.
The specific decoding method is detailed above and is not described again.
In the above embodiment, the step Sb is executed after the encryption/decryption judgment and the decryption processing are performed on each piece of intercepted data, but it is also possible to perform the encryption/decryption judgment once, and if the piece of intercepted data is encrypted data, the decryption processing is performed on the piece of intercepted data, then the decoding processing is directly performed on the piece of intercepted log data, and then the encryption/decryption judgment is performed on any piece of intercepted log data, that is, the decoding processing is performed on any piece of intercepted log data, and before the step Sb, the step Sb further includes: and if any section of log data contains the encryption identifier, decrypting any section of log data.
Step Sb3 performs format conversion processing on each piece of log data after the decoding processing.
For the embodiment of the application, after each piece of data is decoded, each piece of decoded data can be converted into a jason format. Wherein, jason is short for JavaScript Object Notation, and is a lightweight data representation method. The jason format records data in a key-value mode.
Specifically, each piece of decoded log data is converted into a jason format by fastJason.
And step Sb4, performing data mapping on each section of log data after format conversion according to the service type to obtain log data of each service type.
Another possible implementation manner of the embodiment of the present application, after obtaining the log data of each service type, the log data of each type needs to be stored to implement further processing, so that step Sb may further include: storing the log data of each service type to a distributed file system according to the service type; or storing the log data of each service type to the distributed file system according to the sub-service type partitions.
Wherein different partitions store log data for different sub-service types.
Specifically, the data processed by the Spark-streaming program is put into a specific storage directory: a user (user) under initialization (init) under a logical analysis (analysis) bucket, an consume _ specific case (depend _ detail), an event _ specific case (event _ detail), and the like. analyze _ init and continue classification according to the platform.
Of course, in order to obtain more refined log data, the log data of each service type may be classified according to sub-service types, and stored in the distributed file system in a partitioned manner. In the embodiment of the present application, the sub-service types may be obtained by further finely dividing the service types.
Specifically, the logs of each service type may be classified and stored in a partitioned manner according to the difference of items and time.
For example, the user class log data partition is stored in the following directory:
bucket list/analyze/init/user/{ service partition }/{ date };
wherein, the service partition can be any defined character.
In the embodiment, the log data of each service type is stored in the distributed file system according to the service type; or storing the log data of each service type into the distributed file system according to the sub-service type partition, so that the log data of each service type is respectively loaded into the corresponding logic table, and the method comprises the following steps: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
Specifically, the data directory generated by the file system every day can be loaded into the corresponding logic table according to the data partition by the Hadoop-hive configured in the scheduling job process.
In the above embodiment, the log data of each service type may be loaded into the corresponding logic table, or the log data stored in each partition may be loaded into the corresponding logic table, where the log data stored in each partition is the log data of a sub-type of each service type, and therefore step S102 and step S103 may specifically include: step S1021 (not shown in the figure) and step S1031 (not shown in the figure); alternatively, step S1022 (not shown in the figure) and step S1032 (not shown in the figure), wherein,
and S1021, acquiring a logic table corresponding to the service type.
And step S1031, generating corresponding service indexes based on the logic table corresponding to the service types.
And the logic table corresponding to the service type comprises log data of the service type.
Step S1022, determine a sub-service type in the service types of the service index to be generated, and obtain a logic table corresponding to the sub-service type.
Step S1032 generates a corresponding service index based on the logic table corresponding to the sub-service type.
And the logic table corresponding to the sub-service type comprises the log data of the sub-service type.
For the embodiment of the present application, if the service index requested to be generated in the service index generation request is only a service index of a certain service type, for example, a service index of a user class, or the distributed storage file is only stored according to a service type and is not partitioned according to a subtype, step S1021 and step S1031 may be executed to obtain a service index; if the service index requested to be generated in the service index generation request is a service index of a sub-type under a certain service type, for example, a service index of a certain date (e.g., 2019.09.17) under a user class, or the distributed storage file is only partitioned according to sub-service types, step S1022 and step S1032 may be executed. But is not limited to the above.
For the embodiment of the present application, in the above embodiment, processing log data and generating a corresponding service index are performed in a logic layer of a data structure, and after the service index is generated, the generated service index may be displayed in a preset format in a display layer.
For example, the user index, the consumption index, the event index and the like are displayed in the display layer through an excel format.
The foregoing embodiments describe the data processing method from the perspective of a method flow, and the following embodiments describe the data processing apparatus from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments:
an embodiment of the present application provides a data processing apparatus, and as shown in fig. 2, the data processing apparatus 20 may include: a determination module 21, a first acquisition module 22, and a generation module 23, wherein,
the determining module 21 is configured to determine a service type of a service indicator to be generated when a service indicator generation request is received.
The first obtaining module 22 is configured to obtain log data of the service type.
The log data of each type is obtained by classifying the log data collected by the log collection server according to different service types.
And the generating module 23 is configured to generate a corresponding service index based on the log data of the service type.
In a possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second obtaining module, a processing module, and a loading module, wherein,
and the second acquisition module is used for acquiring the log data from the log collection server.
Wherein the log data is generated by the log collection server based on the detected request in the preset format.
The first obtaining module 21 and the second obtaining module may be the same obtaining module or different obtaining modules. The embodiments of the present application are not limited.
The processing module is used for carrying out preset processing on the acquired log data to obtain the log data of each service type;
and the loading module is used for loading the log data of each service type into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the application, when the processing module performs preset processing on the acquired log data to obtain the log data corresponding to each service type, the processing module is specifically configured to:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data and obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before decoding each section of intercepted log data, decrypting each section of intercepted log data containing the encrypted identification; when the processing module decodes the intercepted log data, the processing module is specifically configured to: and decoding each section of log data after decryption.
In another possible implementation manner of the embodiment of the present application, the processing module is further configured to: before any section of intercepted log data is decoded, and when any section of log data contains an encryption identifier, any section of log data is decrypted.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a first storage module, wherein,
the first storage module is used for storing the log data of each service type to the distributed file system according to the service type, or storing the log data of each service type to the distributed file system in a partition mode according to the sub-service type, and storing the log data of different sub-service types in different partitions.
In another possible implementation manner of the embodiment of the present application, when the loading module loads log data of each service type into a corresponding logic table, the loading module is specifically configured to: respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables; or the log data stored in each partition is loaded into the corresponding logic table respectively.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: acquiring a logic table corresponding to the service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type.
In another possible implementation manner of the embodiment of the present application, when acquiring the log data of the service type, the first acquiring module 22 is specifically configured to: determining a sub-service type in the service types of the service indexes to be generated, and acquiring a logic table corresponding to the sub-service type; when the generating module 23 generates a corresponding service index based on the log data of the service type, it is specifically configured to: and generating a corresponding service index based on a logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
In another possible implementation manner of the embodiment of the application, when the second obtaining module obtains the log data from the log collecting server, the second obtaining module is specifically configured to:
when the log data in the log collection server is monitored to be changed, the changed log data is obtained through a log collection system flash and uploaded to a message queue;
and pulling the changed log data from the kafka through Spark-streaming.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a second storage module and a data cleaning module, wherein,
the second storage module is used for storing the changed log data to the distributed file system in real time through the flash and the kafka;
and the data cleaning module is used for cleaning the stored log data at preset time intervals through the distributed file system.
For the embodiment of the present application, the first storage module and the second storage module may be the same storage module or different storage modules, and are not limited in the embodiment of the present application.
Compared with the prior art that all log data need to be traversed when each type of index data is determined, the data processing device determines the service type of the service index to be generated when a service index generation request is received, then obtains the log data of the service type, classifies the log data collected by the log collection server according to different service types to obtain each type of log data, and then generates the corresponding service index based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The data processing apparatus of this embodiment can execute the data processing method provided in this embodiment, and the implementation principles thereof are similar and will not be described herein again.
The above embodiments describe a data processing method from the perspective of a method flow and a data processing apparatus from the perspective of a virtual module or a virtual unit, and the following embodiments describe an electronic device, which may include: the cloud device, the local server, and the terminal device may be configured to execute operations corresponding to the data processing method in the foregoing method embodiment, which is described in detail in the following embodiments:
an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.
The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: according to the method and the device, when a service index generation request is received, the service type of the service index to be generated is determined, then log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by a log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device for generating the service indexes have the advantages that when a service index generation request is received, the service type of the service index to be generated is determined, then the log data of the service type are obtained, the log data of various types are obtained by classifying the log data collected by the log collection server according to different service types, and then the corresponding service index is generated based on the log data of the service type. The collected log data are classified according to different service types in advance, when a certain service index is generated, only the log data of the service index type need to be obtained from the log data classified in advance, and all log data do not need to be traversed, so that the cost and complexity of data processing can be reduced, the time for obtaining the index data can be reduced, and the user experience can be further improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (13)
1. A data processing method, comprising:
when a service index generation request is received, determining the service type of a service index to be generated;
acquiring log data of the service types, wherein the log data of each type are obtained by classifying the log data collected by a log collection server according to different service types;
and generating a corresponding service index based on the log data of the service type.
2. The method of claim 1, wherein the determining the traffic type of the traffic indicator to be generated further comprises:
acquiring log data from a log collection server, wherein the log data is generated by the log collection server based on a detected request in a preset format;
presetting the acquired log data to obtain the log data of each service type;
and respectively loading the log data of each service type into the corresponding logic table.
3. The method according to claim 2, wherein the performing the preset processing on the acquired log data to obtain the log data corresponding to each service type includes:
intercepting the acquired log data according to a first preset rule to obtain intercepted multiple sections of log data;
decoding each section of log data in the intercepted plurality of sections of log data;
carrying out format conversion processing on each section of log data after decoding processing;
and performing data mapping on each section of log data after format conversion according to the service type to obtain the log data of each service type.
4. The method of claim 3, wherein the decoding the intercepted log data further comprises:
carrying out decryption processing on the log data containing the encrypted identification in each intercepted segment of log data;
wherein, the decoding process of each section of intercepted log data includes:
and decoding each section of log data after decryption.
5. The method of claim 3, wherein decoding any piece of intercepted log data further comprises:
and if any section of log data contains the encryption identifier, decrypting any section of log data.
6. The method according to any one of claims 2 to 5, wherein the preset processing is performed on the acquired log data to obtain log data of each service type, and then any one of the following is further included:
storing the log data of each service type to a distributed file system according to the service type;
and storing the log data of each service type into a distributed file system according to the sub-service type partitions, wherein different partitions store the log data of different sub-service types.
7. The method according to claim 6, wherein the loading the log data of each service type into the corresponding logical table respectively comprises any one of:
respectively loading the log data of each service type stored in the distributed file system into corresponding logic tables;
and respectively loading the log data stored in each partition into the corresponding logic table.
8. The method of claim 7, wherein the obtaining log data of the traffic type; generating a corresponding service index based on the log data of the service type, wherein the service index comprises any one of the following items:
acquiring a logic table corresponding to the service type, and generating a corresponding service index based on the table corresponding to the service type, wherein the logic table corresponding to the service type comprises log data of the service type;
determining a sub-service type in the service types of the service indexes to be generated, acquiring a logic table corresponding to the sub-service type, and generating a corresponding service index based on the logic table corresponding to the sub-service type, wherein the logic table corresponding to the sub-service type comprises log data of the sub-service type.
9. The method of claim 2, wherein obtaining log data from a log collection server comprises:
when the log data in the log collection server are monitored to be changed, the changed log data are obtained through a log collection system flash and are uploaded to a message queue kafka;
and pulling the changed log data from the kafka through Spark-streaming.
10. The method of claim 9, wherein the obtaining log data that changes by the log collection system flash further comprises:
storing the changed log data to a distributed file system in real time through the flash and the kafka;
and performing data cleaning on the stored log data at preset time intervals through the distributed file system.
11. A data processing apparatus, comprising:
the determining module is used for determining the service type of the service index to be generated when a service index generating request is received;
the first acquisition module is used for acquiring the log data of the service types, and the log data of each type are obtained by classifying the log data collected by the log collection server according to different service types;
and the generating module is used for generating a corresponding service index based on the log data of the service type.
12. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the data processing method according to any one of claims 1 to 10.
13. A computer readable storage medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a data processing method according to any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910990260.0A CN111796993B (en) | 2019-10-17 | 2019-10-17 | Data processing method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910990260.0A CN111796993B (en) | 2019-10-17 | 2019-10-17 | Data processing method and device, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111796993A true CN111796993A (en) | 2020-10-20 |
CN111796993B CN111796993B (en) | 2023-03-17 |
Family
ID=72805609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910990260.0A Active CN111796993B (en) | 2019-10-17 | 2019-10-17 | Data processing method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111796993B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094342A (en) * | 2021-04-02 | 2021-07-09 | 上海中通吉网络技术有限公司 | Data persistence method, device and equipment and storage medium |
CN113568967A (en) * | 2021-07-29 | 2021-10-29 | 掌阅科技股份有限公司 | Dynamic extraction method of time sequence index data, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608203A (en) * | 2015-12-24 | 2016-05-25 | Tcl集团股份有限公司 | Internet of things log processing method and device based on Hadoop platform |
CN106790572A (en) * | 2016-12-27 | 2017-05-31 | 广州华多网络科技有限公司 | The system and method that a kind of distributed information log is collected |
US20180074852A1 (en) * | 2016-09-14 | 2018-03-15 | Salesforce.Com, Inc. | Compact Task Deployment for Stream Processing Systems |
CN107979477A (en) * | 2016-10-21 | 2018-05-01 | 苏宁云商集团股份有限公司 | A kind of method and system of business monitoring |
CN109274540A (en) * | 2018-11-16 | 2019-01-25 | 四川长虹电器股份有限公司 | A kind of web access log processing method based on storm |
-
2019
- 2019-10-17 CN CN201910990260.0A patent/CN111796993B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608203A (en) * | 2015-12-24 | 2016-05-25 | Tcl集团股份有限公司 | Internet of things log processing method and device based on Hadoop platform |
US20180074852A1 (en) * | 2016-09-14 | 2018-03-15 | Salesforce.Com, Inc. | Compact Task Deployment for Stream Processing Systems |
CN107979477A (en) * | 2016-10-21 | 2018-05-01 | 苏宁云商集团股份有限公司 | A kind of method and system of business monitoring |
CN106790572A (en) * | 2016-12-27 | 2017-05-31 | 广州华多网络科技有限公司 | The system and method that a kind of distributed information log is collected |
CN109274540A (en) * | 2018-11-16 | 2019-01-25 | 四川长虹电器股份有限公司 | A kind of web access log processing method based on storm |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094342A (en) * | 2021-04-02 | 2021-07-09 | 上海中通吉网络技术有限公司 | Data persistence method, device and equipment and storage medium |
CN113568967A (en) * | 2021-07-29 | 2021-10-29 | 掌阅科技股份有限公司 | Dynamic extraction method of time sequence index data, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111796993B (en) | 2023-03-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107577805B (en) | Business service system for log big data analysis | |
US11836533B2 (en) | Automated reconfiguration of real time data stream processing | |
Barika et al. | Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions | |
CN109074377B (en) | Managed function execution for real-time processing of data streams | |
EP3342137B1 (en) | Edge intelligence platform, and internet of things sensor streams system | |
US9946593B2 (en) | Recovery strategy for a stream processing system | |
US9965330B2 (en) | Maintaining throughput of a stream processing framework while increasing processing load | |
CN108039959B (en) | Data situation perception method, system and related device | |
US20190155646A1 (en) | Providing strong ordering in multi-stage streamng processing | |
CN104537076B (en) | A kind of file read/write method and device | |
US20170083380A1 (en) | Managing resource allocation in a stream processing framework | |
CN109831478A (en) | Rule-based and model distributed processing intelligent decision system and method in real time | |
CN109446274B (en) | Method and device for managing BI metadata of big data platform | |
Poojara et al. | Serverless data pipeline approaches for IoT data in fog and cloud computing | |
CN111177237B (en) | Data processing system, method and device | |
CN111431926A (en) | Data association analysis method, system, equipment and readable storage medium | |
CN111796993B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN114265680A (en) | Mass data processing method and device, electronic equipment and storage medium | |
CN110557291A (en) | Network service monitoring system | |
Akanbi | Estemd: A distributed processing framework for environmental monitoring based on apache kafka streaming engine | |
CN115964392A (en) | Real-time monitoring method, device and equipment based on flink and readable storage medium | |
CN114401239A (en) | Metadata transmission method and device, computer equipment and storage medium | |
US9912545B2 (en) | High performance topology resolution for non-instrumented nodes | |
Pourmajidi et al. | Dogfooding: Using ibm cloud services to monitor ibm cloud infrastructure | |
CN110019045B (en) | Log floor method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |