CN116132540B - Multi-service system data processing method and device - Google Patents

Multi-service system data processing method and device Download PDF

Info

Publication number
CN116132540B
CN116132540B CN202310389936.7A CN202310389936A CN116132540B CN 116132540 B CN116132540 B CN 116132540B CN 202310389936 A CN202310389936 A CN 202310389936A CN 116132540 B CN116132540 B CN 116132540B
Authority
CN
China
Prior art keywords
type
data
message queue
service
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310389936.7A
Other languages
Chinese (zh)
Other versions
CN116132540A (en
Inventor
祝敏
刘磊
郝牛牛
罗洋洋
胡江龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Distance Education Holdings Ltd
Original Assignee
China Distance Education Holdings Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Distance Education Holdings Ltd filed Critical China Distance Education Holdings Ltd
Priority to CN202310389936.7A priority Critical patent/CN116132540B/en
Publication of CN116132540A publication Critical patent/CN116132540A/en
Application granted granted Critical
Publication of CN116132540B publication Critical patent/CN116132540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a multi-service system data processing method and device. The multi-service system comprises a plurality of independently operated service modules, wherein the service modules respond to indication information from a network side and operate and set service functions, and the method comprises the following steps: collecting the indication information at a network side of the multi-service system to generate a first message queue; collecting response information of at least one service module, and entering the first message queue; reading the first message queue, responding to first type indication information in the indication information, generating first type statistical data on line, and entering a second message queue; and responding to the first type of response information in the response information, generating third type of statistical data on line, and entering a second message queue. The method and the device solve the problem that the service processing efficiency is reduced due to index processing of the online service system.

Description

Multi-service system data processing method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing data on line in multiple services.
Background
When providing online service for clients through the Internet, a plurality of service functions are operated to meet different demands of users, and correspondingly, a complex computer system formed by a plurality of business processing software and hardware modules exists at a server side. When the service system works, not only the customer instruction is required to provide service content, but also the execution condition of the service function enjoyed by the user is required to be counted so as to evaluate the service efficiency or realize the functions of charging and the like, and the related index processing process and the processing result are determined by the working state of the service system.
For example, an online learning system, for the statistics of learning situations of students, the statistics mode of t+1 offline timing script (i.e. data result is delayed for 1 day) is generally adopted, or the calculation processing logic is highly coupled in the main line service, which brings a certain influence to the main line service, improves the complexity of the system, such as the learning situations of students in class, ranking of questions, the learning situations of students in certain department, and the like, and the generation of the data spreads over each service system, which makes the service system difficult to maintain.
When the service system provides service and processes related indexes online, the processing burden of the service system is increased, and in some cases, the index processing in one service processing module is also based on the working parameters of another service processing module, so that the process of mutually calling data between the service modules frequently occupies interface resources to cause the system response capacity to be reduced.
However, when the index processing is performed in an offline manner, it is not possible to respond to the customer's demands in time, for example, when in an online learning process of one stage, to evaluate learning quality in real time, to predict learning content of the next stage, and the like. Therefore, how to realize efficient real-time processing is also a problem to be solved.
Disclosure of Invention
The application provides a data processing method and device for a multi-service system, which solve the problem that the service processing efficiency is reduced due to index processing of an online service system.
In a first aspect, an embodiment of the present application provides a data processing method of a multi-service system, where the multi-service system includes a plurality of independently operating service modules, and the service modules respond to indication information from a network side to operate and set service functions, and the method includes the following steps:
collecting the indication information at a network side of the multi-service system to generate a first message queue;
collecting response information of at least one service module, and entering the first message queue;
reading the first message queue, responding to first type indication information in the indication information, generating first type statistical data on line, and entering a second message queue; responding to the first type of response information in the response information, generating third type of statistical data on line, and entering a second message queue;
and any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second message queue and outputs a response to the network side.
Preferably, the method further comprises the steps of: reading the first message queue and storing the first message queue into a first database; reading the first database, and generating first type statistical data offline in response to the set first type indication information; and/or reading the first database, and generating third type of statistical data offline in response to the set first type of response information. Further preferably, the method further comprises the steps of: reading the second message queue and storing the second message queue in a second database; and comparing the second database with the offline processing result to generate a comparison result.
Preferably, the method further comprises the steps of: reading the second message queue and storing the second message queue in a second database; and any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second database and outputs a response to the network side.
Preferably, the method further comprises the steps of: reading the first message queue and storing the first message queue into a first database; reading the first database, and generating second type statistical data offline in response to the set second type indication information; and/or reading the first database, and generating fourth type statistical data offline in response to the set second type response information; and any service module reads at least part of the second type of statistical data and/or at least part of the fourth type of statistical data from the local and outputs a response to the network side.
Further preferably, the process of generating the first type of statistical data on line according to the first type of indication information is timed, and the first type of indication information is changed into the second type of indication information in response to the time length of the first type of indication information exceeds a set threshold; and/or timing the process of generating the third type of statistical data on line according to the first type of response information, and changing the first type of response information into the second type of response information in response to the time length of the first type of response information exceeding a set threshold.
In a second aspect, the present application further proposes a multi-service system data processing apparatus, configured to implement a method according to any one of the embodiments of the first aspect of the present application, including:
the first message module is used for collecting the indication information and the response information of at least one service module, generating a first message queue and issuing the first message queue;
and the real-time processing module is used for generating first-type statistical data on line in response to the first-type indication information and/or generating third-type statistical data on line in response to the first-type response information.
And the second message module is used for collecting the first type of statistical data and the third type of statistical data, generating a second message queue and publishing the second message queue for any service module to subscribe.
Further, the multi-service system data processing device further comprises:
a first database for storing a first message queue;
the off-line processing module is used for reading the first database and generating first-type statistical data and/or third-type statistical data off-line;
and the second database is used for routinely importing the second message queue to form a massive parallel processing calculation engine for reading by any service module.
In a third aspect, the present application also proposes a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements a method as described in any of the embodiments of the first aspect of the present application.
In a fourth aspect, the present application also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing a method according to any of the embodiments of the first aspect of the present application when executing the computer program.
The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:
the data processing model which can be independent of the service system can solve the complexity of the service system, can perform real-time data index processing, and improves user experience.
The main line service and the index statistics service are decoupled, and negative influence of the statistics service on the main line service is avoided.
The calculation model of the statistical index is unified, the situation that the statistical index is scattered to each service line and the processing modes of different services are different is avoided, the development is simplified, and the operation cost is saved.
Real-time data index statistics can be performed, and user experience is improved.
The multi-link monitoring ensures the data quality and the index accuracy, and avoids the problems of dirty data, abnormal data, data loss and the like in the prior art, which are always the afterfeel.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flowchart of an embodiment of a method for processing data in a multi-service system according to the present application;
FIG. 2 is a block diagram of an embodiment of a multi-service system data processing apparatus of the present application;
FIG. 3 is a technical framework of a real-time processing function of the multi-service system data processing apparatus of the present application;
FIG. 4 is an embodiment of the multi-service system real-time processing logic of the present application;
fig. 5 is a block diagram of an embodiment of a multi-service system implementing a monitoring processing function in the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of an embodiment of a data processing method of a multi-service system of the present application.
In a first aspect, an embodiment of the present application provides a data processing method of a multi-service system, where the multi-service system includes a plurality of independently operating service modules, and the service modules respond to indication information from a network side to operate and set service functions, and the method includes the following steps:
step 110, a step of real-time pipeline data processing.
The running water nmginx log is collected in real time by means of a jume (streaming log collection tool) at the input end of the multi-service system, for example at the network side interface of each service module.
Further, the process of processing real-time pipeline data according to the present application further includes the following steps 110 a-b:
step 110A, collecting the indication information at a network side (namely a consumption collection end) of the multi-service system to generate a first message queue;
step 110B, reading the first message queue, responding to first type indication information in the indication information, generating first type statistical data on line, and entering a second message queue;
step 120, a step of real-time dimension data processing.
Further, the process of processing real-time dimension data in the present application further includes the following steps 120 a-b:
step 120A, collecting response information of at least one service module, and entering the first message queue;
step 120B, responding to the first type of response information in the response information, generating third type of statistical data on line, and entering a second message queue;
step 130, response of the multi-service system to the publish-subscribe message.
And any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second message queue and outputs a response to the network side. It should be noted that, any of the service modules may be one of the at least one service module that sends the response information in step 120, or may be different from any of the at least one service module.
The service modules 1-N are shown in the figure 2, and reference numerals are 201-20N respectively. When any service module (for example, the service module 1) is one of the at least one service module, the process of the service module for counting the indication information or the response information is avoided, the statistical data is generated online through another real-time processing process, and the service module is used for acquiring the required index data from the second message queue in a subscription mode. It will be appreciated that the processing load of the traffic module is reduced relative to conventional approaches of the prior art.
When any one of the service modules (e.g. service module 2) is not one of the at least one service module (e.g. service module 1), not only the process of counting the indication information by the service module 2 but also the process of accessing the at least one service module 201 by the service module 2 to acquire the response information of the at least one service module 201 and processing the response information into third type of statistical data are avoided; of course, the process of the service module 2 accessing the at least one service module 201 to obtain the third type of statistical data processed by the service module 1 according to the response information is also avoided. The required index data is obtained from the second message queue in a subscription manner by the service module 2. It will be appreciated that the processing load of the traffic module 1 is reduced as well as the processing load of the traffic module 2 relative to the conventional manner of the prior art.
Step 140, response of the multi-service system to the real-time update database.
Optionally, the present application further includes step 130 (the dashed box in the figure represents an optional process). Preferably, the method further comprises the steps of: reading the second message queue and storing the second message queue in a second database; and any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second database and outputs a response to the network side.
Three solutions are considered for performing step 130 and/or step 140. It should be noted that, the steps 130 and 140 are optional steps, and at least one of the steps 130 and 140 may be performed.
Step 150, a step of offline redundancy processing.
The redundant processing refers to the repeated operation of the online processing function in an offline mode. The data processed offline redundancy is the same as the data processed online.
Preferably, the method further comprises the steps of: reading the first message queue and storing the first message queue into a first database; reading the first database, and generating first type statistical data offline in response to the set first type indication information; and/or reading the first database, and generating third type of statistical data offline in response to the set first type of response information. Further preferably, the method further comprises the steps of: reading the second message queue and storing the second message queue in a second database; and comparing the second database with the offline processing result to generate a comparison result. By comparison, errors occurring during online processing can be identified.
Step 160, offline division processing.
The division processing refers to that part of tasks are processed in an online mode, and the other part of tasks are processed in an offline mode. The data processed by the off-line division process is different from the data processed on-line.
Preferably, the method further comprises the steps of: reading the first message queue and storing the first message queue into a first database; reading the first database, and generating second type statistical data offline in response to the set second type indication information; and/or reading the first database, and generating fourth type statistical data offline in response to the set second type response information; and any service module reads at least part of the second type of statistical data and/or at least part of the fourth type of statistical data from the local and outputs a response to the network side.
And the offline division processing avoids the occupation of online resources by index operation with low priority or timeliness requirements by distributing a part of indication information and/or response information into offline processing.
Further preferably, the method may further comprise the steps of:
step 170, timing the process of generating the first type of statistical data on line according to the first type of indication information, and changing the first type of indication information into the second type of indication information in response to the time length of the first type of indication information exceeding a set threshold; and/or timing the process of generating the third type of statistical data on line according to the first type of response information, and changing the first type of response information into the second type of response information in response to the time length of the first type of response information exceeding a set threshold.
The execution subjects of the steps of the method provided in embodiment 1 may be the same apparatus, or the method may be executed by different apparatuses. For example, the execution subject of step 110 and step 120 may be device 1, and the execution subject of step 130 may be device 2; for another example, the execution body of the steps 110 to 130 may be the device 1, and the execution bodies of the steps 140 and 150 may be the device 2; etc.
Fig. 2 is a block diagram of an embodiment of a data processing apparatus of a multi-service system according to the present application.
The multi-service system 20 includes a plurality of independently operated service modules 201, …,20N, which operate set service functions in response to indication information from a network side.
The application further proposes a multi-service system data processing device, configured to implement a method according to any one of the embodiments of the first aspect of the application, where the device includes:
a first message module 21, configured to collect the indication information and response information of at least one service module, generate a first message queue, and issue the first message queue; for example, an Nginx log of input stream data of the multi-service system comprises the indication information; the response information is service basic data generated by the service module and is collected in a DP mode; the first message module, for example, contains a Source end Kafka-based publish/subscribe message system.
The real-time processing module 22 generates the first type of statistical data online in response to the first type of indication information and/or generates the third type of statistical data online in response to the first type of response information. And the real-time processing module processes and calculates upstream flow data in real time according to the index caliber of the service by using a real-time processing frame based on the Flink.
And the second message module 23 is configured to collect the first type of statistical data and the third type of statistical data, generate a second message queue, and issue the second message queue for any service module to subscribe. The first type of statistical data is real-time result data calculated according to the index caliber; and the third type of statistical data is real-time dimension data generated according to the business basic data. The second message module, for example, comprises an export (Sink) end Kafka based publish/subscribe message system.
Further, the multi-service system data processing device further comprises:
a first database 24 for storing a first message queue. For example, the first database is a locally stored database, further, may be uploaded to a distributed file system (Hdfs).
The offline processing module 25 is configured to read the first database, generate the first type of statistical data and/or the third type of statistical data offline, as shown in step 150, specifically generate the first type of statistical data offline in response to the set first type of indication information; and/or reading the first database, and generating third type of statistical data offline in response to the set first type of response information. Further, the offline processing module is further configured to generate the second type of statistical data and/or the fourth type of statistical data offline, as described in step 160, specifically, generate the second type of statistical data offline in response to the set second type of indication information; and/or reading the first database, and generating fourth type of statistical data offline in response to the set second type of response information.
A second database 26, configured to routinely import (route Load) a second message queue, forms a massively parallel processing computing engine for reading by any of the service modules.
According to the method and the device, the service line and the statistical line are decoupled by processing the Nginx log, so that the service line is ensured to run stably, service developers only need to pay attention to the service, the service maintenance cost is reduced, and the service risk is reduced.
Fig. 3 is a technical framework of a real-time processing function of the multi-service system data processing device of the present application.
Dimension table data: business base data, such as specific information of coaching, subjects, courseware and the like, is collected by means of DP.
Nmginx log: and (5) looking at Nginx logs of class timing stream data and question stream data, and collecting the Nginx logs in a manner of Flume.
Kafka (source): and the source-side publishing and subscribing message system mainly receives dimension data and stream data sent by the DP and the Flume.
The Flink real-time processing framework: and calculating the flow data of the upstream processing in real time according to the index caliber of the service. Specific data index calculations may be processed herein.
Kafka (real-time dimension of output, convergence result): the upstream Flink writes the calculated dimension data and the result data calculated according to the index caliber into the Kafka for downstream use. The aggregate result can be aggregated upwards on the result after the statistics of the fine granularity is performed, and the aggregate result is expanded according to specific requirements.
Doris (MPP database): the massive parallel processing computing engine ensures second-level response of mass data, and uses two characteristics sequence key and batch delete.
Hdfs (file system): and saving the data falling into the Doris into a distributed file system for abnormal data recovery and correction.
Fig. 4 is an embodiment of the multi-service system real-time processing logic of the present application.
The service modules of a multi-service system, for example comprising a 1 st service module and a 2 nd service module, are each illustrated below,
the 1 st business module provides a course watching service, wherein a video course situation statistical model is watched: the method comprises the steps that a stream Nginx log of a class watching service is collected and transmitted to a high-throughput distributed publishing and subscribing message system through a Flume, then data indexes required by a service are calculated in real time through a Flink, and the data are synchronized to a database of an MPP framework for high-concurrency real-time query of the service.
The 2 nd business module provides the question making service. And (3) making a question condition statistical model: the model for doing the questions and the model for seeing the lessons are unified, the flowing water Nginx logs for doing the questions are collected and transmitted to a high-throughput distributed publishing and subscribing message system through a Flume, then data indexes required by the service are calculated in real time through the Flink, and the data are synchronized into a database of an MPP framework for the service to be capable of being queried in real time with high concurrency.
The data processing process is specifically described as follows:
source (Source) side functionality: the provider of the data includes the data for the questions, the lessons and the related dimensions. The questions and the lessons are stream data, which are reported by a Flume collecting Nginx log, and the dimension data are obtained by a relational database, and the Kafka data storage is realized by the first message module 21.
A Transform function processes source data via the real-time processing module 22 and ensures idempotent and accuracy of the data. For example, the embodiment shown in fig. 4 mainly relates to six components, including a question-making flow analysis component, a question-making flow index component, a question-making dimension table component, a class-listening flow analysis component, a class-listening flow index component, and a class-listening dimension table component. And a question-listening and class-setting flow component for analyzing, converting and outputting flow data. And the topic class making index component calculates and outputs the analyzed flow data according to the specific index caliber of the service. And a question-listening class maintenance table component is used for processing the dimension data in the relational database in real time.
Output (Sink) side function: because the real-time downstream is mainly a Doris database and is synchronized by adopting a routine import mode, sink is a message publishing and subscribing system, and there are mainly several types of Topic: the flowing water Topic, the index Topic and the dimension Topic all realize Kafka data storage through the second message module 23 and are routinely released to the Doris database.
In this embodiment, real-time data extraction and conversion are performed on source stream data corresponding to a user's course watching, questions making, etc., and the source stream data is finally provided to an upper layer service, that is, a service and data statistics decoupling model is used to replace a service and data statistics highly-coupled model, and a real-time calculation model is used to replace an offline model. The complexity of the system can be reduced, the performance, stability and expandability of the system are improved, and the system is convenient to maintain and upgrade; and the data index can be produced in real time, so that the user experience is improved.
Fig. 5 is a block diagram of an embodiment of a multi-service system implementing a monitoring processing function in the present application.
In the embodiment of the application, the main flow is processed in real time, and the acquisition end submits data to Kafka; the Flink performs processing conversion of data through Kafka of a consumption acquisition end, idempotent property is guaranteed by making questions and listening to the lesson data, the repeated consumption does not affect a final result, the processed data is written into the Kafka, and then the processed data is synchronized to Doris in a routine import mode. There are two places in this process, kafka is used, the main purpose is to ensure the robustness of the whole flow on the one hand, and to check the data and recover the abnormal data on the other hand.
According to the data flow, recording Nginx logs of the user listening and doing questions at the data input/output interfaces of each service module, wherein the Nginx logs comprise user indication information, and can also comprise instructions for calling service functions and/or calling operation data and response data of operation results of each service module.
The data acquisition and release are realized through the first message module 21, and the running program completes the following functions: collecting Nginx logs of listening to lessons and doing questions in real time through a thumb, and writing the Nginx logs into a publishing and subscribing message system.
Kafka data storage is implemented by the first database 24: this data source is partly a stem-on, topical nmginx log collected by the flime and partly simulated test data. And meanwhile, the two parts of data are persisted to local storage, and are subjected to subsequent offline data processing, data comparison and alarm, data verification and data recovery.
The link processing is implemented by the real-time processing module 22. The flank will process the production data and test data and synchronize the data into Kafka (second message module 23, running a publish-subscribe message system) for use by Doris, where Kafka data will persist locally (first database), and subsequent offline data processing, data comparison alarms, data verification, and data recovery are performed through the current data node.
Routine introduction: for processing data in kafka, typically in JSON format, a data table corresponds to a routine import task, and there is a corresponding monitor to monitor whether the routine import process is normal.
A second database 26, such as Doris (MPP): and storing the data written in real time, inquiring upper-layer business, and performing data verification and comparison on the generated real-time data. If there is abnormal data, data is abnormally replayed from the two kafka positions according to the located abnormal data positions.
To interface to upper layer traffic, 1 or more traffic processing modules of the traffic system 20 access the second database.
The data processing process monitoring of the embodiment of the application comprises data delay monitoring, operation abnormality monitoring and data index abnormality monitoring.
For example, data delay monitoring is implemented by the delay monitoring module 31. The delay monitoring module is used for timing a process of generating first-type statistical data on line according to the first-type indication information and/or timing a process of generating third-type statistical data on line according to the first-type response information. For example, the source end writes the data with the identification bit into the publish-subscribe message system (and the production data are independent topics, so that data pollution is avoided), then writes the data into the publish-subscribe system through real-time calculation, and writes the data into a Doris (MPP) database in a routine import mode. By calculating the record with the identification bit in Doris, whether the time from the generation to the processing to the on-line of the data exceeds a set threshold value or not is judged. Further, the delay monitoring module is further configured to simulate the test data and store the test data locally (e.g., in the first database) for subsequent offline data processing. The offline processed simulated test data is compared with the online processed second database data to generate a test report.
For another example, operational anomaly monitoring: the process monitoring module 32 is used for mainly monitoring the process of routine introduction to monitor whether the routine introduction state is normal or not, so as to avoid writing the Doris data delay.
For another example, data index anomaly monitoring is implemented by the comparison module 33. And the comparison module is used for comparing the second database with the offline processing result and generating a comparison result. The offline processing result comprises first type statistical data and third type statistical data output by the offline processing module. The first type of statistical data and the third type of statistical data are generated in an off-line mode in an hour level by falling the data to corresponding server disks and uploading the data to Hdfs at the positions of two publishing and subscribing message systems, and the data indexes are compared with the on-line generated data to determine whether the on-line generated first type of statistical data and/or the on-line generated third type of statistical data are normal or not, and when the abnormal threshold value is reached, abnormal data generated in the Flink processing is confirmed and alarm is given.
It should be noted that, in the above embodiment of the present application, the nmginx Log records the data of the subject running water, and may also Log in the form of the program Log, so that the performance of the main service may be affected.
It should be noted that, the dimension table data collection may also use Canal to perform conversion transmission of data.
It should also be noted that the publish-subscribe messaging system may be replaced with other MQs in addition to Kafka; the real-time computing framework can adopt a Flink, and can also adopt Spark Streaming instead.
In the above embodiment, the pySpark monitoring system that can be used in the monitoring system may also be customized and developed according to a specific service.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Thus, in a third aspect, the present application also proposes a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements a method according to any of the embodiments of the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Further, in a fourth aspect, the present application also proposes an electronic device (or computing device) comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method according to any of the embodiments of the present application when executing the computer program.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. In the present application, "at least 1" means 1 or more.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A multi-service system data processing method, the multi-service system comprising a plurality of independently operating service modules, the service modules operating a set service function in response to indication information from a network side, comprising the steps of:
collecting the indication information at a network side of the multi-service system to generate a first message queue;
collecting response information of at least one service module, and entering the first message queue;
generating statistics online through another process in real time, including: reading the first message queue, responding to first type indication information in the indication information, generating first type statistical data on line, and entering a second message queue; responding to the first type of response information in the response information, generating third type of statistical data on line, and entering a second message queue;
and any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second message queue in a subscription mode and outputs a response to the network side.
2. The multi-service system data processing method as claimed in claim 1, further comprising the steps of:
reading the first message queue and storing the first message queue into a first database;
reading the first database, and generating first type statistical data offline in response to the set first type indication information; and/or reading the first database, and generating third type of statistical data offline in response to the set first type of response information.
3. The multi-service system data processing method as claimed in claim 1, further comprising the steps of:
reading the first message queue and storing the first message queue into a first database;
reading the first database, and generating second type statistical data offline in response to the set second type indication information; and/or reading the first database, and generating fourth type statistical data offline in response to the set second type response information;
and any service module reads at least part of the second type of statistical data and/or at least part of the fourth type of statistical data from the local and outputs a response to the network side.
4. The multi-service system data processing method as claimed in claim 1, further comprising the steps of:
reading the second message queue and storing the second message queue in a second database;
and any service module acquires at least one part of data in the first type of statistical data and/or at least one part of data in the third type of statistical data from the second database and outputs a response to the network side.
5. The multi-service system data processing method as claimed in claim 2, further comprising the steps of:
reading the second message queue and storing the second message queue in a second database;
and comparing the second database with the offline processing result.
6. The multi-service system data processing method as claimed in claim 3, further comprising the steps of:
timing a process of generating first type statistical data on line according to first type indication information, and changing certain first type indication information into second type indication information in response to the time length of the certain first type indication information exceeding a set threshold; and/or timing the process of generating the third type of statistical data on line according to the first type of response information, and changing the first type of response information into the second type of response information in response to the time length of the first type of response information exceeding a set threshold.
7. A multi-service system data processing apparatus for implementing the multi-service system data processing method according to any one of claims 1 to 6, comprising:
the first message module is used for collecting the indication information and the response information of at least one service module, generating a first message queue and issuing the first message queue;
the real-time processing module is used for responding to the first type indication information and generating first type statistical data on line and/or responding to the first type response information and generating third type statistical data on line;
and the second message module is used for collecting the first type of statistical data and the third type of statistical data, generating a second message queue and publishing the second message queue for any service module to subscribe.
8. The multi-service system data processing apparatus of claim 7, further comprising:
a first database for storing a first message queue;
the off-line processing module is used for reading the first database and generating first-type statistical data and/or third-type statistical data off-line;
and the second database is used for routinely importing the second message queue to form a massive parallel processing calculation engine for reading by any service module.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-6 when executing the computer program.
CN202310389936.7A 2023-04-13 2023-04-13 Multi-service system data processing method and device Active CN116132540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310389936.7A CN116132540B (en) 2023-04-13 2023-04-13 Multi-service system data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310389936.7A CN116132540B (en) 2023-04-13 2023-04-13 Multi-service system data processing method and device

Publications (2)

Publication Number Publication Date
CN116132540A CN116132540A (en) 2023-05-16
CN116132540B true CN116132540B (en) 2023-08-01

Family

ID=86301287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310389936.7A Active CN116132540B (en) 2023-04-13 2023-04-13 Multi-service system data processing method and device

Country Status (1)

Country Link
CN (1) CN116132540B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582537A (en) * 2018-11-07 2019-04-05 阿里巴巴集团控股有限公司 Service security means of defence and its system
CN109753531A (en) * 2018-12-26 2019-05-14 深圳市麦谷科技有限公司 A kind of big data statistical method, system, computer equipment and storage medium
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment
CN111371892A (en) * 2020-03-05 2020-07-03 中国银行股份有限公司 High-concurrency distributed message pushing system and method
CN112181678A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Service data processing method, device and system, storage medium and electronic device
CN112506978A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Big data real-time processing method, device and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341956A1 (en) * 2017-05-26 2018-11-29 Digital River, Inc. Real-Time Web Analytics System and Method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582537A (en) * 2018-11-07 2019-04-05 阿里巴巴集团控股有限公司 Service security means of defence and its system
CN109753531A (en) * 2018-12-26 2019-05-14 深圳市麦谷科技有限公司 A kind of big data statistical method, system, computer equipment and storage medium
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment
CN111371892A (en) * 2020-03-05 2020-07-03 中国银行股份有限公司 High-concurrency distributed message pushing system and method
CN112181678A (en) * 2020-09-10 2021-01-05 珠海格力电器股份有限公司 Service data processing method, device and system, storage medium and electronic device
CN112506978A (en) * 2020-12-15 2021-03-16 中国联合网络通信集团有限公司 Big data real-time processing method, device and equipment

Also Published As

Publication number Publication date
CN116132540A (en) 2023-05-16

Similar Documents

Publication Publication Date Title
Bordin et al. DSPBench: A suite of benchmark applications for distributed data stream processing systems
CN102724059A (en) Website operation state monitoring and abnormal detection based on MapReduce
CN112152823B (en) Website operation error monitoring method and device and computer storage medium
CN104573124A (en) Education cloud application statistics method based on parallelized association rule algorithm
CN113360554A (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
KR101989330B1 (en) Auditing of data processing applications
CN107153702A (en) A kind of data processing method and device
CN114629949B (en) Service monitoring method, electronic equipment and computer storage medium
CN116132540B (en) Multi-service system data processing method and device
CN107480189A (en) A kind of various dimensions real-time analyzer and method
CN109993576B (en) Method and system for sensing service quality, acquiring, processing and analyzing data
CN116506300A (en) Website traffic data statistics method and system
CN113220530B (en) Data quality monitoring method and platform
CN115391429A (en) Time sequence data processing method and device based on big data cloud computing
Nazeer et al. Real-time text analytics pipeline using open-source big data tools
CN113487103A (en) Model updating method, device, equipment and storage medium
Wu et al. RIVA: A Real-Time Information Visualization and analysis platform for social media sentiment trend
Pope et al. Quartermaster: A tool for modeling and simulating system degradation
CN115811634B (en) Processing method, system, equipment and medium for video user behavior data
CN114490811A (en) Moonlet life cycle data management system
US20220237070A1 (en) Method and device for determining at least one machine involved in an anomaly detected in a complex computing infrastructure.
CN115329745A (en) Data processing method and device
CN115796457A (en) Personnel and enterprise rating method and system based on multidimensional data
CN117708245A (en) Data processing method, device, equipment and storage medium based on data warehouse
CN114428893A (en) Public opinion real-time monitoring system based on stream-oriented processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant