CN110716909A

CN110716909A - Commercial system based on data analysis management

Info

Publication number: CN110716909A
Application number: CN201910936400.6A
Authority: CN
Inventors: 李振宏; 林良; 谭绍炜
Original assignee: Guangzhou Dining Road Information Technology Co Ltd
Current assignee: Guangzhou Dining Road Information Technology Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-01-21

Abstract

The invention discloses a commercial system based on data analysis management, which adopts a distributed service framework Dubbo and comprises a full link log module, wherein a Candao Sleuth is used for carrying out link marking in a code, the collection of logs is carried out through flash, the logs are sent to a Kafka message queue for buffering, and Logstash consumes the logs from the Kafka and inserts the logs into an elastic search, so that the logs of the full link are stored and inquired. The invention can input keywords into the log background, easily inquire all related logs, and check the whole link log according to the log ID, thereby greatly improving the efficiency and having better practicability.

Description

Commercial system based on data analysis management

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a commercial system based on data analysis management.

Background

With the development of internet technology, various industries begin to introduce big data and internet of things technology to strive for more efficient business service masses. For example, in the taxi industry, the conventional roadside waiting taxi is switched to the current network taxi reservation, and the taxi is not necessarily empty when being seen in the conventional taxi taking mode. In the catering industry, the traditional going-out ordering is switched to the current online ordering, the traditional going-out ordering is generally concentrated in the eating time of people and is regular, the business hours of shops are concentrated, and the business is saturated easily; the current online ordering business is dispersed in time periods, the business hours of stores are increased, and the phenomenon of saturated paralysis of concentrated ordering is not easy to occur. However, the good realization is based on the optimized integration and quick response of data of the ordering system or the car renting system.

In the prior art, an intelligent system may adopt a distributed architecture, however, in the distributed architecture, the system is split into a plurality of subsystems, a common request may need to be processed by the plurality of subsystems to be responded back, and each subsystem is deployed on N servers in a cluster, and if log query is performed by manually searching a log file on the server, the efficiency is very low.

Disclosure of Invention

The invention aims to provide a commercial system based on data analysis management, which can input keywords into a log background, easily inquire all related logs, check the whole link log according to the log ID, greatly improve the efficiency and have better practicability.

The invention is mainly realized by the following technical scheme: a commercial system based on data analysis management adopts a distributed service framework Dubbo, comprises a full link log module, uses Candao Sleuth to mark a link in a code, collects logs through flash, and sends the logs to a Kafka message queue for buffering, and Logstash consumes the logs from the Kafka and inserts the logs into an Elasticisearch, so that log storage and query of the full link are performed.

In order to better realize the invention, the log stream is used for ELK to store, analyze and display the log, so that maintenance personnel can search useful information in mass log data in real time; the log stream carries out real-time stream calculation to the Spark Streaming, the current running state of the system is analyzed from the mass data in near real time, and monitoring and early warning processing are carried out; all logs enter a distributed file system (HDFS), and the performance condition of the system in the previous day is regularly analyzed every morning so as to make reference to the overall performance trend and performance optimization of the system.

In order to better implement the invention, the invention further comprises a real-time early warning module, which performs real-time short message and mail mode early warning according to the configured rule through Spark Streaming real-time stream calculation.

In order to better implement the invention, the system further comprises a performance analysis module, and the performance analysis module analyzes daily logs at regular time through Hadoop big data processing and counts the daily performance condition of the system.

In order to better implement the present invention, the present invention further comprises a data storage module, wherein the data storage module comprises a service data storage unit, a log storage unit and a big data storage unit; the service data storage unit selects MongoDB to store service data, adopts a copy set architecture to build, supports fault transfer and read-write separation, and ensures the stability and high availability of a database; the log storage unit selects an elastic search to store the log, is deployed in a cluster mode, can be rapidly expanded, and ensures efficient storage and retrieval of the log; the big data storage unit selects HDFS to store big data, is deployed in a cluster mode, can be rapidly expanded, and ensures the marine storage and analysis of data.

In order to better implement the present invention, further, the data storage module includes a KV storage unit; redis is selected as a cache for the KV memory unit, and the KV memory unit is deployed in a main-standby mode, so that high availability is ensured, and the performance of the whole system is improved.

In order to better implement the invention, further, the whole system runs on a Linux centros 7.264 bit system.

In order to better realize the invention, further, a Disconf distributed configuration center is introduced into the system, so that the configuration can be uniformly managed and maintained; an XXL-Job distributed scheduling system is introduced into the system.

The invention has the beneficial effects that:

(1) the invention can input keywords into the log background, easily inquire all related logs, and check the whole link log according to the log ID, thereby greatly improving the efficiency.

(2) The system hidden danger which possibly occurs is solved at the initial stage through real-time early warning, and the occurrence of faults is avoided.

(3) The system can be continuously controlled and optimized in time through performance statistics.

(4) A Disconf distributed configuration center is introduced, and unified management and maintenance can be performed on configuration.

Drawings

FIG. 1 is a functional block diagram of the present invention;

fig. 2 is a flowchart of log processing.

Detailed Description

The present invention will be described in further detail with reference to preferred examples thereof, but the present invention is not limited thereto. Wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functionality throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

Example 1:

a commercial system based on data analysis management adopts a distributed service framework Dubbo, comprises a full link log module, uses Candao Sleuth to mark a link in a code, collects logs through flash, and sends the logs to a Kafka message queue for buffering, and Logstash consumes the logs from the Kafka and inserts the logs into an Elasticisearch, so that log storage and query of the full link are performed.

The invention can input keywords into the log background, easily inquire all related logs, and check the whole link log according to the log ID, thereby greatly improving the efficiency and having better practicability.

Example 2:

in this embodiment, optimization is performed on the basis of embodiment 1, as shown in fig. 2, the log stream is used for ELK log storage, analysis and display, and maintenance personnel can search for useful information in a large amount of log data in real time; the log stream carries out real-time stream calculation to the Spark Streaming, the current running state of the system is analyzed from the mass data in near real time, and monitoring and early warning processing are carried out; all logs enter a distributed file system (HDFS), and the performance condition of the system in the previous day is regularly analyzed every morning so as to make reference to the overall performance trend and performance optimization of the system.

Other parts of this embodiment are the same as embodiment 1, and thus are not described again.

Example 3:

the embodiment is optimized on the basis of the embodiment 1 or 2, and further comprises a real-time early warning module, which performs real-time short message and mail mode early warning according to configured rules through spark streaming real-time stream calculation. The system also comprises a performance analysis module which is used for analyzing the daily logs at regular time through Hadoop big data processing and counting the daily performance condition of the system.

The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.

Example 4:

in this embodiment, optimization is performed on the basis of any one of embodiments 1 to 3, and as shown in fig. 1, the data storage module further includes a service data storage unit, a log storage unit, and a big data storage unit; the service data storage unit selects MongoDB to store service data, adopts a copy set architecture to build, supports fault transfer and read-write separation, and ensures the stability and high availability of a database; the log storage unit selects an elastic search to store the log, is deployed in a cluster mode, can be rapidly expanded, and ensures efficient storage and retrieval of the log; the big data storage unit selects HDFS to store big data, is deployed in a cluster mode, can be rapidly expanded, and ensures the marine storage and analysis of data.

The data storage module comprises a KV storage unit; redis is selected as a cache for the KV memory unit, and the KV memory unit is deployed in a main-standby mode, so that high availability is ensured, and the performance of the whole system is improved.

Other parts of this embodiment are the same as those of any of embodiments 1 to 3, and thus are not described again.

Example 5:

the embodiment is optimized on the basis of the embodiments 1-4, and the system runs on a Linux Centos 7.264 bit system. The system introduces a Disconf distributed configuration center to carry out unified management and maintenance on the configuration; an XXL-Job distributed scheduling system is introduced into the system.

Example 6:

a commercial system based on data analysis management, as shown in fig. 1 and fig. 2, mainly comprising the following contents:

the distributed service framework comprises the following steps: using a mature distributed open source framework Dubbo;

and (3) distributed task scheduling: using a distributed task scheduling framework XXL-JOB;

the distributed configuration center: using a distributed configuration management framework Disconfig;

full link logging: link marking is carried out in the code by using Candao Sleuth, collection of logs is carried out through flash, the logs are sent to a Kafka message queue for buffering, and Logstash consumes the logs from the Kafka and inserts the logs into an Elasticisearch, so that log storage and query of a full link are carried out.

Real-time early warning: and performing real-time short message + mail mode early warning according to the configured rule through Spark Streaming real-time stream calculation.

Performance analysis: and (4) analyzing the daily logs at regular time through Hadoop big data processing, and counting the daily performance condition of the system.

Data storage:

and (3) service data storage: MongoDB is selected for storing business data, a copy set architecture is adopted for construction, fault transfer and read-write separation are supported, and stability and high availability of a database are ensured.

Log storage: the Elasticissearch is selected for storing the log, and the cluster deployment is adopted, so that the log can be quickly expanded, and the efficient storage and retrieval of the log are ensured.

And (3) large data storage: the HDFS is selected for storage of big data, and the data is deployed in a cluster mode, so that the data can be rapidly expanded, and the data can be stored and analyzed in the sea.

KV storage: redis is selected as a cache, and the primary and standby modes are adopted for deployment, so that high availability is ensured, and the performance of the whole system is improved.

And (3) operating environment: the overall system is running on the Linux centros 7.264 bit system.

Full link logging/early warning notification/performance statistics: full link logging: in a distributed architecture, a system is divided into a plurality of subsystems, a common request can be responded to back only by being processed by the plurality of subsystems, each subsystem is deployed on N servers in a cluster mode, efficiency is very low if log query is carried out in a mode of manually searching log files on the servers, a log query system with a full link is needed, keywords can be input into a log background, all related logs can be queried easily, the logs of the whole link can be checked according to log IDs, and efficiency is greatly improved.

Early warning notification: the system hidden danger which possibly occurs is solved at the initial stage through real-time early warning, and the occurrence of faults is avoided.

And (4) performance statistics: and continuously controlling and optimizing the system through performance statistics.

As shown in fig. 2, the system is implemented as follows: the method comprises the following steps that (1) the Flume is utilized to collect and aggregate logs of each server, the Kakfa message queue buffers the logs, and then the logs flow to the following 3 places:

ELK (elastic search + logstack + Kibana): the log storage, analysis and display are carried out, and useful information is searched in real time by maintenance personnel in mass log data;

spark Streaming: performing real-time stream calculation, analyzing the current running state of the system from the mass data in near real time, and performing monitoring and early warning processing;

HDFS + Hadoop: all logs enter a distributed file system (HDFS), and the performance condition of the system in the previous day is regularly analyzed every morning so as to make reference to the overall performance trend and performance optimization of the system.

The distributed configuration center: in the system, a large amount of configuration information exists, and the configuration is performed through a configuration file in the traditional method, but the configuration file becomes large and is difficult to manage and maintain in a distributed environment, at this time, a Disconf distributed configuration center is introduced, and the configuration can be managed and maintained uniformly.

Distributed dispatching center: there are various timing tasks in the system, and the timing tasks need to be managed and triggered uniformly in a distributed environment, and an XXL-Job distributed scheduling system is introduced.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A commercial system based on data analysis management adopts a distributed service framework Dubbo, and is characterized by comprising a full link log module, wherein a Candao Sleuth is used for carrying out link marking in a code, the collection of logs is carried out through flash, the logs are sent to a Kafka message queue for buffering, and Logstash consumes the logs from the Kafka and inserts the logs into an Elasticisearch, so that the logs of a full link are stored and inquired.

2. The business system based on data analysis management of claim 1, wherein the log stream is used for ELK log storage, analysis and display, and is used for maintenance personnel to search for useful information in real time in massive log data; the log stream carries out real-time stream calculation to the Spark Streaming, the current running state of the system is analyzed from the mass data in near real time, and monitoring and early warning processing are carried out; all logs enter a distributed file system (HDFS), and the performance condition of the system in the previous day is regularly analyzed every morning so as to make reference to the overall performance trend and performance optimization of the system.

3. The business system based on data analysis management of claim 2, further comprising a real-time early warning module for performing real-time short message and email mode early warning according to configured rules through Spark Streaming real-time Streaming calculation.

4. The business system based on data analysis management of claim 2, further comprising a performance analysis module for analyzing daily logs at regular time through Hadoop big data processing to count the performance of the system each day.

5. The business system based on data analysis management as claimed in claim 1, further comprising a data storage module, wherein the data storage module comprises a business data storage unit, a log storage unit and a big data storage unit; the service data storage unit selects MongoDB to store service data, adopts a copy set architecture to build, supports fault transfer and read-write separation, and ensures the stability and high availability of a database; the log storage unit selects an elastic search to store the log, is deployed in a cluster mode, can be rapidly expanded, and ensures efficient storage and retrieval of the log; the big data storage unit selects HDFS to store big data, is deployed in a cluster mode, can be rapidly expanded, and ensures the marine storage and analysis of data.

6. The business system based on data analysis management of claim 5, wherein the data storage module comprises KV memory cells; redis is selected as a cache for the KV memory unit, and the KV memory unit is deployed in a main-standby mode, so that high availability is ensured, and the performance of the whole system is improved.

7. The business system of claim 1, wherein the system is run on the Linux centros 7.264 bit system.

8. The business system based on data analysis management of claim 7, wherein the system incorporates a Disconf distributed configuration center to manage and maintain configurations uniformly; an XXL-Job distributed scheduling system is introduced into the system.