CN110232044B - System and method for realizing big data summarizing and scheduling service - Google Patents
System and method for realizing big data summarizing and scheduling service Download PDFInfo
- Publication number
- CN110232044B CN110232044B CN201910521428.3A CN201910521428A CN110232044B CN 110232044 B CN110232044 B CN 110232044B CN 201910521428 A CN201910521428 A CN 201910521428A CN 110232044 B CN110232044 B CN 110232044B
- Authority
- CN
- China
- Prior art keywords
- local
- data
- cloud
- scheduling service
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/122—File system administration, e.g. details of archiving or snapshots using management policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Abstract
The invention discloses a system and a method for realizing big data summary scheduling service, and belongs to the technical field of big data storage. The implementation system of the big data summarizing and scheduling service comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API (Web application program interface) encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to aggregate and group query data of a user relation, the local scheduling service performs summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service. The implementation system of the big data summarizing and scheduling service is suitable for classifying, aggregating and summarizing query of big data, solves the association problem between cloud servers and between the big data and the relational data, and has good popularization and application values.
Description
Technical Field
The invention relates to the technical field of big data storage, and particularly provides a system and a method for realizing big data summary scheduling service.
Background
As user behavior data generated by the system continuously increases, a large amount of behavior data is no longer suitable for manual reading or viewing, and therefore, some technologies are often adopted to analyze and sort existing user behaviors and display user data information in a chart or other easy-to-read modes.
The expansion of the storage mode of the collected behavior data and the selection of diversity hope that the collected behavior data can be output to more media. Experience of retrieving behavior data in real time is enhanced, and more users put forward the requirement of rapidly retrieving mass behavior data. With the advent of the personalized chart era, the demand for providing friendly Web interfaces and enhancing user experience is increasing. To meet the above requirements, elastic Search has become the document database of our choice.
At present, in order to relieve the pressure of big data transmission, a reverse proxy and a disk queue are arranged on a cloud server, and a multi-node cluster is accessed to process data. Therefore, the clustered deployment using the elasticx as a reverse proxy tool, using the Apache Kafka disk queue to cache requests, and using the ElasticSearch becomes an indispensable deployment scheme. WebAPI based on a salted hash of a timestamp enhances the security of public network call APIs and is also a standard in the industry. In order to implement configurable automated job scheduling, the Quartz scheduling service has gained consistent acceptance by the industry. In summary, based on the ElasticSearch, the reliable Web API and the Quartz component, it is realized that a big data summary scheduling service is developed while user experience, reliability, safety and robustness are ensured.
Disclosure of Invention
The technical task of the invention is to provide a system for realizing big data summarization scheduling service, which is suitable for the classification, aggregation, summarization and query of big data and solves the association problem between cloud servers and between big data and relational data.
The invention further provides a method for realizing the big data summarizing and scheduling service.
In order to achieve the purpose, the invention provides the following technical scheme:
a big data summarizing and scheduling service implementation system comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to conduct aggregation and grouping query on data of a user relation, the local scheduling service conducts summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service.
The default of the local relational database is MicroSoft Sql Server, and the alternative databases are MySQL and PosgreSQL.
Preferably, the local scheduling service provided by the local server periodically sends Http requests to the cloud end Web API of the cloud service end according to the locally configured time interval, and inserts the returned summary index data into the local relational database.
The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe.
Preferably, the index data provided by the cloud server are PC number, abnormal constant and active user number, and the received parameters are time period parameters and index type parameters.
Preferably, the PV details and the abnormal details of the user are grouped by aggregation and date in the file database ElasticSearch to obtain the PV number, the abnormal constant and the number of active users required by the user.
Preferably, the local scheduling service realizes a custom scheduling task based on an open source scheduling library Quartz.
And the open source scheduling library Quartz sends an Http request to a Web API (application program interface) of a cloud server regularly according to a locally configured time interval, and inserts related indexes into a local relational database according to returned summary index data.
A method for realizing big data summary scheduling service comprises the steps of grouping aggregation query of big data, summarizing a cloud Web API and local scheduling service into a local relational database, and specifically comprises the following steps:
s1, starting a local scheduling service by a local server, and regularly sending an Http request to a cloud Web API of a cloud service end;
s2, according to the Http request, the cloud service end performs security judgment on the returned data, and performs data fetching after the security is confirmed;
s3, according to the safety judgment in the step S2 and the data type of the Web API request, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user;
s4, the cloud server returns index data;
and S5, the local server receives the index data returned by the cloud server and stores the summarized index data.
Preferably, in step S1, a parameter of the salt value and the MD5 hash value is added to the Http request for identification security.
The local server starts local scheduling service, mainly flexibly and freely configures the collection time interval for acquiring index data, and acquires the user-defined time interval configuration section through the Quartz component. In contrast to the conventional timer component, quartz can implement more complicated time interval configuration, such as defining the peak time interval of operation of each day to be collected once in N minutes, and the off-duty time interval to be collected once in 4N minutes according to the user's requirement. Quartz can be conveniently integrated with Springboot applications, so this scheduling service establishes the form of microservice distribution in production. The method includes integrating Springboot and Quartz components, packaging the Springboot and Quartz components into a Docker mirror image, and publishing the Docker mirror image into a Docker container on a local host machine, so that deployment of micro-services is achieved.
Preferably, in step S3, the summarized data is grouped and aggregated according to an aggregation requirement and a fetching convention, where the aggregation requirement includes a time group, an enterprise group, or a user group, and the fetching convention includes a PV number, an abnormal number, or a user number.
And (4) safety certification of the Web API. The basic information of the HTTP message body comprises a salt value parameter, an MD5 summary message parameter and an index data list parameter. The salt parameter in the message is synthesized by a current time stamp, a random number and two public mark character strings, and the MD5 summary message in the message is 16-system message data generated by MD5 summary after the public mark character strings are combined with the salt. The list of metric data in the message is the specific metric data (PV number, iso-constant, or user number) requested by the user.
Preferably, in step S5, the local server receives the index data returned by the cloud server, deserializes the returned JSON data into a data object list that needs to be stored in the local relational database, and inserts corresponding index data into the relational database.
Compared with the prior art, the method for realizing the big data summarizing and scheduling service has the following outstanding beneficial effects: the implementation method of the big data summarization scheduling service depends on an elastic search of a file database, wherein the elastic search is an Apache Lucene-based open-source big data search engine and provides a distributed big data storage and full text search engine with multi-user capability. The invention provides a grouping aggregation query interface based on an open source search engine ElasticSearch, which issues Hash salt encryption Web service, realizes a user-defined scheduling task based on an open source scheduling library Quartz, and performs safety isolation on query of a search engine database through Hash salt encryption safety certification, thereby achieving high-safety and high-reliability data summarization scheduling service, being suitable for classified aggregation summarization query of big data, solving the problem of correlation between cloud servers and between the big data and relational data, and having good popularization and application values.
Drawings
Fig. 1 is an architecture diagram of a system for implementing big data summarization scheduling service according to the present invention.
Detailed Description
The following describes the implementation system and method of big data summary scheduling service in detail with reference to the accompanying drawings and embodiments.
Examples
As shown in fig. 1, the implementation system of the big data summarization scheduling service of the present invention includes a cloud server and a local server.
The cloud service end provides a cloud end Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to conduct aggregation and grouping query on data of user relations, the local scheduling service conducts summary scheduling service in a time interval configuration mode, and the local relational database receives summary data transmitted back by the local scheduling service.
The local relational database is a MicroSoft Sql Server.
And the local scheduling service provided by the local server regularly sends Http requests to a cloud Web API of the cloud server according to the locally configured time interval, and inserts the returned summary index data into the local relational database. The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe. The local scheduling service realizes a user-defined scheduling task based on an open source scheduling library Quartz. And the open source scheduling library Quartz periodically sends an Http request to a Web API (application program interface) of a cloud server according to a locally configured time interval, and inserts related indexes into a local relational database according to the returned summary index data.
Index data provided by the cloud server are PC number, abnormal constant and active user number, and received parameters are time period parameters and index type parameters. And the file database ElasticSearch groups the PV details and the abnormal details of the user by aggregation and date to obtain the PV number, the abnormal constant and the number of active users required by the user.
The implementation method of the big data summarizing and scheduling service comprises the steps of grouping and aggregating inquiry of big data, summarizing the cloud Web API and the local scheduling service into a local relational database, and specifically comprises the following steps:
s1, a local server starts a local scheduling service and sends an Http request to a cloud Web API of a cloud service end at regular time. The Http request is added with a parameter of a salt value and an MD5 hash value for identification security.
And (4) safety certification of the Web API. The basic information of the HTTP message body comprises a salt value parameter, an MD5 summary message parameter and an index data list parameter. The salt parameter in the message is synthesized by a current time stamp, a random number and two public mark character strings, and the MD5 summary message in the message is 16-system message data generated by MD5 summary after the public mark character strings are combined with the salt. The list of metric data in the message is the specific metric data (PV number, iso-constant, or user number) requested by the user.
The local server starts local scheduling service, mainly flexibly and freely configures the collection time interval for acquiring index data, and acquires the user-defined time interval configuration section through the Quartz component. In contrast to the conventional timer component, quartz can implement more complicated time interval configuration, such as defining the peak time interval of operation of each day to be collected once in N minutes, and the off-duty time interval to be collected once in 4N minutes according to the user's requirement. Quartz can be conveniently integrated with Springboot applications, so this scheduling service establishes the form of microservice distribution in production. The method includes integrating Springboot and Quartz components, packaging the Springboot and Quartz components into a Docker mirror image, and publishing the Docker mirror image into a Docker container on a local host machine, so that deployment of micro-services is achieved.
The send Http request run code for this process is as follows:
sending Http request run code as follows:
and S2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data fetching after the security is confirmed.
And S3, according to the safety judgment in the step S2 and the data type requested by the Web API, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user.
And grouping and aggregating the summarized data according to an aggregation requirement and a data fetching convention, wherein the aggregation requirement comprises time grouping, enterprise grouping or user grouping, and the data fetching convention comprises a PV number, an abnormal constant or a user number.
The communication decryption code of the handshake protocol is as follows:
and S4, returning index data by the cloud server.
And S5, the local server receives the index data returned by the cloud server and stores the aggregated index data.
The local server receives index data returned by the cloud server, deserializes the returned JSON data into a data object list needing to be stored in the local relational database, and inserts corresponding index data into the relational database.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (2)
1. A big data summarization scheduling service implementation system is characterized in that: the cloud service end provides a cloud WebAPI encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to aggregate and group query data of user relations, the local scheduling service collects and schedule services in a time interval configuration mode, the local relational database receives collected data transmitted back by the local scheduling service, the local scheduling service provided by the local server regularly sends an Http request to the cloud WebAPI of the cloud service end according to the local configuration time interval, the returned collected index data are inserted into the local relational database, the index data provided by the cloud service end are PV numbers, abnormal constants and active user numbers, the received parameters are time period parameters and index type parameters, the file database ElasticSearch groups PV and abnormal details of users through aggregation and dates to obtain PV numbers, abnormal constants and user details required by the users, and the active scheduling service realizes a Quartz scheduling task based on an open source scheduling user database.
2. A big data summarization scheduling service realization method is characterized in that: the method comprises the steps of grouping aggregation query of big data, gathering a cloud WebAPI and local scheduling service into a local relational database, and specifically comprises the following steps:
s1, a local server starts a local scheduling service, an Http request is sent to a cloud end WebAPI of a cloud service end in a timing mode, and parameters of a salt value and an MD5 hash value are added to the Http request for identification safety;
s2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data acquisition after the security is confirmed;
s3, according to the safety judgment in the step S2 and the data type of the Web API request, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user;
s4, the cloud server returns index data, and the summarized data are grouped and aggregated according to aggregation requirements and access conventions, wherein the aggregation requirements comprise time grouping, enterprise grouping or user grouping, and the access conventions comprise PV numbers, abnormal constants or user numbers;
and S5, the local server receives the index data returned by the cloud server and stores the summarized index data, the local server receives the index data returned by the cloud server, the returned JSON data is deserialized into a data object list which needs to be stored in a local relational database, and corresponding index data are inserted into the relational database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910521428.3A CN110232044B (en) | 2019-06-17 | 2019-06-17 | System and method for realizing big data summarizing and scheduling service |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910521428.3A CN110232044B (en) | 2019-06-17 | 2019-06-17 | System and method for realizing big data summarizing and scheduling service |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110232044A CN110232044A (en) | 2019-09-13 |
CN110232044B true CN110232044B (en) | 2023-03-28 |
Family
ID=67859989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910521428.3A Active CN110232044B (en) | 2019-06-17 | 2019-06-17 | System and method for realizing big data summarizing and scheduling service |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110232044B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110865608A (en) * | 2019-11-21 | 2020-03-06 | 武夷学院 | Reconfigurable manufacturing system |
CN113630238B (en) * | 2021-08-10 | 2024-02-23 | 中国工商银行股份有限公司 | User request permission method and device based on password confusion |
CN114338812B (en) * | 2022-01-07 | 2024-04-05 | 德微电技术(深圳)有限公司 | Equipment networking control adjusting system |
CN114564455B (en) * | 2022-02-25 | 2024-01-19 | 苏州浪潮智能科技有限公司 | Data set display method, device and equipment of distributed system and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012139328A1 (en) * | 2011-04-14 | 2012-10-18 | 中兴通讯股份有限公司 | Cache server system and application method thereof, cache client, and cache server |
CN107958080A (en) * | 2017-12-14 | 2018-04-24 | 上海特易信息科技有限公司 | A kind of big data report processing method based on ElasticSearch |
CN108874524A (en) * | 2018-06-21 | 2018-11-23 | 山东浪潮商用系统有限公司 | Big data distributed task dispatching system |
CN109325047A (en) * | 2018-11-22 | 2019-02-12 | 北京明朝万达科技股份有限公司 | A kind of interactive mode ElasticSearch depth paging query method and apparatus |
-
2019
- 2019-06-17 CN CN201910521428.3A patent/CN110232044B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012139328A1 (en) * | 2011-04-14 | 2012-10-18 | 中兴通讯股份有限公司 | Cache server system and application method thereof, cache client, and cache server |
CN107958080A (en) * | 2017-12-14 | 2018-04-24 | 上海特易信息科技有限公司 | A kind of big data report processing method based on ElasticSearch |
CN108874524A (en) * | 2018-06-21 | 2018-11-23 | 山东浪潮商用系统有限公司 | Big data distributed task dispatching system |
CN109325047A (en) * | 2018-11-22 | 2019-02-12 | 北京明朝万达科技股份有限公司 | A kind of interactive mode ElasticSearch depth paging query method and apparatus |
Non-Patent Citations (1)
Title |
---|
基于大数据的统一监控系统研究;吕霞;《计算机产品与流通》;20171115(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110232044A (en) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232044B (en) | System and method for realizing big data summarizing and scheduling service | |
US11720537B2 (en) | Bucket merging for a data intake and query system using size thresholds | |
US20230315785A1 (en) | Processing data using containerized nodes in a containerized scalable environment | |
US11567960B2 (en) | Isolated execution environment system monitoring | |
US11860874B2 (en) | Multi-partitioning data for combination operations | |
US11003714B1 (en) | Search node and bucket identification using a search node catalog and a data store catalog | |
US11106734B1 (en) | Query execution using containerized state-free search nodes in a containerized scalable environment | |
US10776355B1 (en) | Managing, storing, and caching query results and partial query results for combination with additional query results | |
US11663212B2 (en) | Identifying configuration parameters for a query using a metadata catalog | |
US11269939B1 (en) | Iterative message-based data processing including streaming analytics | |
US10984044B1 (en) | Identifying buckets for query execution using a catalog of buckets stored in a remote shared storage system | |
US11250056B1 (en) | Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system | |
US11327992B1 (en) | Authenticating a user to access a data intake and query system | |
US11157498B1 (en) | Query generation using a dataset association record of a metadata catalog | |
US11157497B1 (en) | Dynamically assigning a search head and search nodes for a query | |
US11275733B1 (en) | Mapping search nodes to a search head using a tenant identifier | |
US11657057B2 (en) | Revising catalog metadata based on parsing queries | |
US11416465B1 (en) | Processing data associated with different tenant identifiers | |
US11526504B1 (en) | Search-time field extraction in a data intake and query system | |
US11636116B2 (en) | User interface for customizing data streams | |
US11567993B1 (en) | Copying buckets from a remote shared storage system to memory associated with a search node for query execution | |
US11550847B1 (en) | Hashing bucket identifiers to identify search nodes for efficient query execution | |
US11562023B1 (en) | Merging buckets in a data intake and query system | |
US11392578B1 (en) | Automatically generating metadata for a metadata catalog based on detected changes to the metadata catalog | |
US11620336B1 (en) | Managing and storing buckets to a remote shared storage system based on a collective bucket size |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230308 Address after: 250000 Langchao Road, Jinan, Shandong Applicant after: Inspur Genersoft Co.,Ltd. Address before: 250100 No. 2877 Kehang Road, Sun Village Town, Jinan High-tech District, Shandong Province Applicant before: SHANDONG INSPUR GENESOFT INFORMATION TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |