CN110232044B

CN110232044B - System and method for realizing big data summarizing and scheduling service

Info

Publication number: CN110232044B
Application number: CN201910521428.3A
Authority: CN
Inventors: 张胤; 戴海宏; 仪思奇
Original assignee: Inspur General Software Co Ltd
Current assignee: Inspur General Software Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2023-03-28
Anticipated expiration: 2039-06-17
Also published as: CN110232044A

Abstract

The invention discloses a system and a method for realizing big data summary scheduling service, and belongs to the technical field of big data storage. The implementation system of the big data summarizing and scheduling service comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API (Web application program interface) encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to aggregate and group query data of a user relation, the local scheduling service performs summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service. The implementation system of the big data summarizing and scheduling service is suitable for classifying, aggregating and summarizing query of big data, solves the association problem between cloud servers and between the big data and the relational data, and has good popularization and application values.

Description

System and method for realizing big data summarizing and scheduling service

Technical Field

The invention relates to the technical field of big data storage, and particularly provides a system and a method for realizing big data summary scheduling service.

Background

As user behavior data generated by the system continuously increases, a large amount of behavior data is no longer suitable for manual reading or viewing, and therefore, some technologies are often adopted to analyze and sort existing user behaviors and display user data information in a chart or other easy-to-read modes.

The expansion of the storage mode of the collected behavior data and the selection of diversity hope that the collected behavior data can be output to more media. Experience of retrieving behavior data in real time is enhanced, and more users put forward the requirement of rapidly retrieving mass behavior data. With the advent of the personalized chart era, the demand for providing friendly Web interfaces and enhancing user experience is increasing. To meet the above requirements, elastic Search has become the document database of our choice.

At present, in order to relieve the pressure of big data transmission, a reverse proxy and a disk queue are arranged on a cloud server, and a multi-node cluster is accessed to process data. Therefore, the clustered deployment using the elasticx as a reverse proxy tool, using the Apache Kafka disk queue to cache requests, and using the ElasticSearch becomes an indispensable deployment scheme. WebAPI based on a salted hash of a timestamp enhances the security of public network call APIs and is also a standard in the industry. In order to implement configurable automated job scheduling, the Quartz scheduling service has gained consistent acceptance by the industry. In summary, based on the ElasticSearch, the reliable Web API and the Quartz component, it is realized that a big data summary scheduling service is developed while user experience, reliability, safety and robustness are ensured.

Disclosure of Invention

The technical task of the invention is to provide a system for realizing big data summarization scheduling service, which is suitable for the classification, aggregation, summarization and query of big data and solves the association problem between cloud servers and between big data and relational data.

The invention further provides a method for realizing the big data summarizing and scheduling service.

In order to achieve the purpose, the invention provides the following technical scheme:

a big data summarizing and scheduling service implementation system comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to conduct aggregation and grouping query on data of a user relation, the local scheduling service conducts summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service.

The default of the local relational database is MicroSoft Sql Server, and the alternative databases are MySQL and PosgreSQL.

Preferably, the local scheduling service provided by the local server periodically sends Http requests to the cloud end Web API of the cloud service end according to the locally configured time interval, and inserts the returned summary index data into the local relational database.

The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe.

Preferably, the index data provided by the cloud server are PC number, abnormal constant and active user number, and the received parameters are time period parameters and index type parameters.

Preferably, the PV details and the abnormal details of the user are grouped by aggregation and date in the file database ElasticSearch to obtain the PV number, the abnormal constant and the number of active users required by the user.

Preferably, the local scheduling service realizes a custom scheduling task based on an open source scheduling library Quartz.

And the open source scheduling library Quartz sends an Http request to a Web API (application program interface) of a cloud server regularly according to a locally configured time interval, and inserts related indexes into a local relational database according to returned summary index data.

A method for realizing big data summary scheduling service comprises the steps of grouping aggregation query of big data, summarizing a cloud Web API and local scheduling service into a local relational database, and specifically comprises the following steps:

s1, starting a local scheduling service by a local server, and regularly sending an Http request to a cloud Web API of a cloud service end;

s2, according to the Http request, the cloud service end performs security judgment on the returned data, and performs data fetching after the security is confirmed;

s3, according to the safety judgment in the step S2 and the data type of the Web API request, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user;

s4, the cloud server returns index data;

and S5, the local server receives the index data returned by the cloud server and stores the summarized index data.

Preferably, in step S1, a parameter of the salt value and the MD5 hash value is added to the Http request for identification security.

The local server starts local scheduling service, mainly flexibly and freely configures the collection time interval for acquiring index data, and acquires the user-defined time interval configuration section through the Quartz component. In contrast to the conventional timer component, quartz can implement more complicated time interval configuration, such as defining the peak time interval of operation of each day to be collected once in N minutes, and the off-duty time interval to be collected once in 4N minutes according to the user's requirement. Quartz can be conveniently integrated with Springboot applications, so this scheduling service establishes the form of microservice distribution in production. The method includes integrating Springboot and Quartz components, packaging the Springboot and Quartz components into a Docker mirror image, and publishing the Docker mirror image into a Docker container on a local host machine, so that deployment of micro-services is achieved.

Preferably, in step S3, the summarized data is grouped and aggregated according to an aggregation requirement and a fetching convention, where the aggregation requirement includes a time group, an enterprise group, or a user group, and the fetching convention includes a PV number, an abnormal number, or a user number.

And (4) safety certification of the Web API. The basic information of the HTTP message body comprises a salt value parameter, an MD5 summary message parameter and an index data list parameter. The salt parameter in the message is synthesized by a current time stamp, a random number and two public mark character strings, and the MD5 summary message in the message is 16-system message data generated by MD5 summary after the public mark character strings are combined with the salt. The list of metric data in the message is the specific metric data (PV number, iso-constant, or user number) requested by the user.

Preferably, in step S5, the local server receives the index data returned by the cloud server, deserializes the returned JSON data into a data object list that needs to be stored in the local relational database, and inserts corresponding index data into the relational database.

Compared with the prior art, the method for realizing the big data summarizing and scheduling service has the following outstanding beneficial effects: the implementation method of the big data summarization scheduling service depends on an elastic search of a file database, wherein the elastic search is an Apache Lucene-based open-source big data search engine and provides a distributed big data storage and full text search engine with multi-user capability. The invention provides a grouping aggregation query interface based on an open source search engine ElasticSearch, which issues Hash salt encryption Web service, realizes a user-defined scheduling task based on an open source scheduling library Quartz, and performs safety isolation on query of a search engine database through Hash salt encryption safety certification, thereby achieving high-safety and high-reliability data summarization scheduling service, being suitable for classified aggregation summarization query of big data, solving the problem of correlation between cloud servers and between the big data and relational data, and having good popularization and application values.

Drawings

Fig. 1 is an architecture diagram of a system for implementing big data summarization scheduling service according to the present invention.

Detailed Description

The following describes the implementation system and method of big data summary scheduling service in detail with reference to the accompanying drawings and embodiments.

Examples

As shown in fig. 1, the implementation system of the big data summarization scheduling service of the present invention includes a cloud server and a local server.

The cloud service end provides a cloud end Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to conduct aggregation and grouping query on data of user relations, the local scheduling service conducts summary scheduling service in a time interval configuration mode, and the local relational database receives summary data transmitted back by the local scheduling service.

The local relational database is a MicroSoft Sql Server.

And the local scheduling service provided by the local server regularly sends Http requests to a cloud Web API of the cloud server according to the locally configured time interval, and inserts the returned summary index data into the local relational database. The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe. The local scheduling service realizes a user-defined scheduling task based on an open source scheduling library Quartz. And the open source scheduling library Quartz periodically sends an Http request to a Web API (application program interface) of a cloud server according to a locally configured time interval, and inserts related indexes into a local relational database according to the returned summary index data.

Index data provided by the cloud server are PC number, abnormal constant and active user number, and received parameters are time period parameters and index type parameters. And the file database ElasticSearch groups the PV details and the abnormal details of the user by aggregation and date to obtain the PV number, the abnormal constant and the number of active users required by the user.

The implementation method of the big data summarizing and scheduling service comprises the steps of grouping and aggregating inquiry of big data, summarizing the cloud Web API and the local scheduling service into a local relational database, and specifically comprises the following steps:

s1, a local server starts a local scheduling service and sends an Http request to a cloud Web API of a cloud service end at regular time. The Http request is added with a parameter of a salt value and an MD5 hash value for identification security.

The send Http request run code for this process is as follows:

/>

/>

/>

/>

/>

/>

sending Http request run code as follows:

/>

/>

and S2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data fetching after the security is confirmed.

And S3, according to the safety judgment in the step S2 and the data type requested by the Web API, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user.

And grouping and aggregating the summarized data according to an aggregation requirement and a data fetching convention, wherein the aggregation requirement comprises time grouping, enterprise grouping or user grouping, and the data fetching convention comprises a PV number, an abnormal constant or a user number.

The communication decryption code of the handshake protocol is as follows:

/>

/>

/>

/>

/>

/>

and S4, returning index data by the cloud server.

And S5, the local server receives the index data returned by the cloud server and stores the aggregated index data.

The local server receives index data returned by the cloud server, deserializes the returned JSON data into a data object list needing to be stored in the local relational database, and inserts corresponding index data into the relational database.

The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims

1. A big data summarization scheduling service implementation system is characterized in that: the cloud service end provides a cloud WebAPI encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to aggregate and group query data of user relations, the local scheduling service collects and schedule services in a time interval configuration mode, the local relational database receives collected data transmitted back by the local scheduling service, the local scheduling service provided by the local server regularly sends an Http request to the cloud WebAPI of the cloud service end according to the local configuration time interval, the returned collected index data are inserted into the local relational database, the index data provided by the cloud service end are PV numbers, abnormal constants and active user numbers, the received parameters are time period parameters and index type parameters, the file database ElasticSearch groups PV and abnormal details of users through aggregation and dates to obtain PV numbers, abnormal constants and user details required by the users, and the active scheduling service realizes a Quartz scheduling task based on an open source scheduling user database.

2. A big data summarization scheduling service realization method is characterized in that: the method comprises the steps of grouping aggregation query of big data, gathering a cloud WebAPI and local scheduling service into a local relational database, and specifically comprises the following steps:

s1, a local server starts a local scheduling service, an Http request is sent to a cloud end WebAPI of a cloud service end in a timing mode, and parameters of a salt value and an MD5 hash value are added to the Http request for identification safety;

s2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data acquisition after the security is confirmed;

s4, the cloud server returns index data, and the summarized data are grouped and aggregated according to aggregation requirements and access conventions, wherein the aggregation requirements comprise time grouping, enterprise grouping or user grouping, and the access conventions comprise PV numbers, abnormal constants or user numbers;

and S5, the local server receives the index data returned by the cloud server and stores the summarized index data, the local server receives the index data returned by the cloud server, the returned JSON data is deserialized into a data object list which needs to be stored in a local relational database, and corresponding index data are inserted into the relational database.