CN110232044B - System and method for realizing big data summarizing and scheduling service - Google Patents

System and method for realizing big data summarizing and scheduling service Download PDF

Info

Publication number
CN110232044B
CN110232044B CN201910521428.3A CN201910521428A CN110232044B CN 110232044 B CN110232044 B CN 110232044B CN 201910521428 A CN201910521428 A CN 201910521428A CN 110232044 B CN110232044 B CN 110232044B
Authority
CN
China
Prior art keywords
local
data
cloud
scheduling service
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910521428.3A
Other languages
Chinese (zh)
Other versions
CN110232044A (en
Inventor
张胤
戴海宏
仪思奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN201910521428.3A priority Critical patent/CN110232044B/en
Publication of CN110232044A publication Critical patent/CN110232044A/en
Application granted granted Critical
Publication of CN110232044B publication Critical patent/CN110232044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention discloses a system and a method for realizing big data summary scheduling service, and belongs to the technical field of big data storage. The implementation system of the big data summarizing and scheduling service comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API (Web application program interface) encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to aggregate and group query data of a user relation, the local scheduling service performs summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service. The implementation system of the big data summarizing and scheduling service is suitable for classifying, aggregating and summarizing query of big data, solves the association problem between cloud servers and between the big data and the relational data, and has good popularization and application values.

Description

System and method for realizing big data summarizing and scheduling service
Technical Field
The invention relates to the technical field of big data storage, and particularly provides a system and a method for realizing big data summary scheduling service.
Background
As user behavior data generated by the system continuously increases, a large amount of behavior data is no longer suitable for manual reading or viewing, and therefore, some technologies are often adopted to analyze and sort existing user behaviors and display user data information in a chart or other easy-to-read modes.
The expansion of the storage mode of the collected behavior data and the selection of diversity hope that the collected behavior data can be output to more media. Experience of retrieving behavior data in real time is enhanced, and more users put forward the requirement of rapidly retrieving mass behavior data. With the advent of the personalized chart era, the demand for providing friendly Web interfaces and enhancing user experience is increasing. To meet the above requirements, elastic Search has become the document database of our choice.
At present, in order to relieve the pressure of big data transmission, a reverse proxy and a disk queue are arranged on a cloud server, and a multi-node cluster is accessed to process data. Therefore, the clustered deployment using the elasticx as a reverse proxy tool, using the Apache Kafka disk queue to cache requests, and using the ElasticSearch becomes an indispensable deployment scheme. WebAPI based on a salted hash of a timestamp enhances the security of public network call APIs and is also a standard in the industry. In order to implement configurable automated job scheduling, the Quartz scheduling service has gained consistent acceptance by the industry. In summary, based on the ElasticSearch, the reliable Web API and the Quartz component, it is realized that a big data summary scheduling service is developed while user experience, reliability, safety and robustness are ensured.
Disclosure of Invention
The technical task of the invention is to provide a system for realizing big data summarization scheduling service, which is suitable for the classification, aggregation, summarization and query of big data and solves the association problem between cloud servers and between big data and relational data.
The invention further provides a method for realizing the big data summarizing and scheduling service.
In order to achieve the purpose, the invention provides the following technical scheme:
a big data summarizing and scheduling service implementation system comprises a cloud server and a local server, wherein the cloud server provides a cloud Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud server calls the file database ElasticSearch to conduct aggregation and grouping query on data of a user relation, the local scheduling service conducts summarizing and scheduling service in a time interval configuration mode, and the local relational database receives summarized data transmitted back by the local scheduling service.
The default of the local relational database is MicroSoft Sql Server, and the alternative databases are MySQL and PosgreSQL.
Preferably, the local scheduling service provided by the local server periodically sends Http requests to the cloud end Web API of the cloud service end according to the locally configured time interval, and inserts the returned summary index data into the local relational database.
The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe.
Preferably, the index data provided by the cloud server are PC number, abnormal constant and active user number, and the received parameters are time period parameters and index type parameters.
Preferably, the PV details and the abnormal details of the user are grouped by aggregation and date in the file database ElasticSearch to obtain the PV number, the abnormal constant and the number of active users required by the user.
Preferably, the local scheduling service realizes a custom scheduling task based on an open source scheduling library Quartz.
And the open source scheduling library Quartz sends an Http request to a Web API (application program interface) of a cloud server regularly according to a locally configured time interval, and inserts related indexes into a local relational database according to returned summary index data.
A method for realizing big data summary scheduling service comprises the steps of grouping aggregation query of big data, summarizing a cloud Web API and local scheduling service into a local relational database, and specifically comprises the following steps:
s1, starting a local scheduling service by a local server, and regularly sending an Http request to a cloud Web API of a cloud service end;
s2, according to the Http request, the cloud service end performs security judgment on the returned data, and performs data fetching after the security is confirmed;
s3, according to the safety judgment in the step S2 and the data type of the Web API request, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user;
s4, the cloud server returns index data;
and S5, the local server receives the index data returned by the cloud server and stores the summarized index data.
Preferably, in step S1, a parameter of the salt value and the MD5 hash value is added to the Http request for identification security.
The local server starts local scheduling service, mainly flexibly and freely configures the collection time interval for acquiring index data, and acquires the user-defined time interval configuration section through the Quartz component. In contrast to the conventional timer component, quartz can implement more complicated time interval configuration, such as defining the peak time interval of operation of each day to be collected once in N minutes, and the off-duty time interval to be collected once in 4N minutes according to the user's requirement. Quartz can be conveniently integrated with Springboot applications, so this scheduling service establishes the form of microservice distribution in production. The method includes integrating Springboot and Quartz components, packaging the Springboot and Quartz components into a Docker mirror image, and publishing the Docker mirror image into a Docker container on a local host machine, so that deployment of micro-services is achieved.
Preferably, in step S3, the summarized data is grouped and aggregated according to an aggregation requirement and a fetching convention, where the aggregation requirement includes a time group, an enterprise group, or a user group, and the fetching convention includes a PV number, an abnormal number, or a user number.
And (4) safety certification of the Web API. The basic information of the HTTP message body comprises a salt value parameter, an MD5 summary message parameter and an index data list parameter. The salt parameter in the message is synthesized by a current time stamp, a random number and two public mark character strings, and the MD5 summary message in the message is 16-system message data generated by MD5 summary after the public mark character strings are combined with the salt. The list of metric data in the message is the specific metric data (PV number, iso-constant, or user number) requested by the user.
Preferably, in step S5, the local server receives the index data returned by the cloud server, deserializes the returned JSON data into a data object list that needs to be stored in the local relational database, and inserts corresponding index data into the relational database.
Compared with the prior art, the method for realizing the big data summarizing and scheduling service has the following outstanding beneficial effects: the implementation method of the big data summarization scheduling service depends on an elastic search of a file database, wherein the elastic search is an Apache Lucene-based open-source big data search engine and provides a distributed big data storage and full text search engine with multi-user capability. The invention provides a grouping aggregation query interface based on an open source search engine ElasticSearch, which issues Hash salt encryption Web service, realizes a user-defined scheduling task based on an open source scheduling library Quartz, and performs safety isolation on query of a search engine database through Hash salt encryption safety certification, thereby achieving high-safety and high-reliability data summarization scheduling service, being suitable for classified aggregation summarization query of big data, solving the problem of correlation between cloud servers and between the big data and relational data, and having good popularization and application values.
Drawings
Fig. 1 is an architecture diagram of a system for implementing big data summarization scheduling service according to the present invention.
Detailed Description
The following describes the implementation system and method of big data summary scheduling service in detail with reference to the accompanying drawings and embodiments.
Examples
As shown in fig. 1, the implementation system of the big data summarization scheduling service of the present invention includes a cloud server and a local server.
The cloud service end provides a cloud end Web API and a file database ElasticSearch which are subjected to Hash salt addition encryption, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to conduct aggregation and grouping query on data of user relations, the local scheduling service conducts summary scheduling service in a time interval configuration mode, and the local relational database receives summary data transmitted back by the local scheduling service.
The local relational database is a MicroSoft Sql Server.
And the local scheduling service provided by the local server regularly sends Http requests to a cloud Web API of the cloud server according to the locally configured time interval, and inserts the returned summary index data into the local relational database. The local server takes the time stamp and the salt value as basic values in the encryption mode, returns a 16-system abstract through MD5 hashing as a basis for judging whether the message is safe or not, the cloud server side judges according to the transmitted salt value and the hash value, and if the hash code generated by the cloud server side and the hash code transmitted by the local server are the same, the message is safe. The local scheduling service realizes a user-defined scheduling task based on an open source scheduling library Quartz. And the open source scheduling library Quartz periodically sends an Http request to a Web API (application program interface) of a cloud server according to a locally configured time interval, and inserts related indexes into a local relational database according to the returned summary index data.
Index data provided by the cloud server are PC number, abnormal constant and active user number, and received parameters are time period parameters and index type parameters. And the file database ElasticSearch groups the PV details and the abnormal details of the user by aggregation and date to obtain the PV number, the abnormal constant and the number of active users required by the user.
The implementation method of the big data summarizing and scheduling service comprises the steps of grouping and aggregating inquiry of big data, summarizing the cloud Web API and the local scheduling service into a local relational database, and specifically comprises the following steps:
s1, a local server starts a local scheduling service and sends an Http request to a cloud Web API of a cloud service end at regular time. The Http request is added with a parameter of a salt value and an MD5 hash value for identification security.
And (4) safety certification of the Web API. The basic information of the HTTP message body comprises a salt value parameter, an MD5 summary message parameter and an index data list parameter. The salt parameter in the message is synthesized by a current time stamp, a random number and two public mark character strings, and the MD5 summary message in the message is 16-system message data generated by MD5 summary after the public mark character strings are combined with the salt. The list of metric data in the message is the specific metric data (PV number, iso-constant, or user number) requested by the user.
The local server starts local scheduling service, mainly flexibly and freely configures the collection time interval for acquiring index data, and acquires the user-defined time interval configuration section through the Quartz component. In contrast to the conventional timer component, quartz can implement more complicated time interval configuration, such as defining the peak time interval of operation of each day to be collected once in N minutes, and the off-duty time interval to be collected once in 4N minutes according to the user's requirement. Quartz can be conveniently integrated with Springboot applications, so this scheduling service establishes the form of microservice distribution in production. The method includes integrating Springboot and Quartz components, packaging the Springboot and Quartz components into a Docker mirror image, and publishing the Docker mirror image into a Docker container on a local host machine, so that deployment of micro-services is achieved.
The send Http request run code for this process is as follows:
Figure BDA0002096825960000051
/>
Figure BDA0002096825960000061
/>
Figure BDA0002096825960000071
/>
Figure BDA0002096825960000081
/>
Figure BDA0002096825960000091
/>
Figure BDA0002096825960000101
/>
Figure BDA0002096825960000111
sending Http request run code as follows:
Figure BDA0002096825960000121
/>
Figure BDA0002096825960000131
/>
Figure BDA0002096825960000141
and S2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data fetching after the security is confirmed.
And S3, according to the safety judgment in the step S2 and the data type requested by the Web API, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user.
And grouping and aggregating the summarized data according to an aggregation requirement and a data fetching convention, wherein the aggregation requirement comprises time grouping, enterprise grouping or user grouping, and the data fetching convention comprises a PV number, an abnormal constant or a user number.
The communication decryption code of the handshake protocol is as follows:
Figure BDA0002096825960000142
/>
Figure BDA0002096825960000151
/>
Figure BDA0002096825960000161
/>
Figure BDA0002096825960000171
/>
Figure BDA0002096825960000181
/>
Figure BDA0002096825960000191
Figure BDA0002096825960000201
/>
and S4, returning index data by the cloud server.
And S5, the local server receives the index data returned by the cloud server and stores the aggregated index data.
The local server receives index data returned by the cloud server, deserializes the returned JSON data into a data object list needing to be stored in the local relational database, and inserts corresponding index data into the relational database.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (2)

1. A big data summarization scheduling service implementation system is characterized in that: the cloud service end provides a cloud WebAPI encrypted by Hash and salt and a file database ElasticSearch, the local server provides a local scheduling service and a local relational database, the cloud service end calls the file database ElasticSearch to aggregate and group query data of user relations, the local scheduling service collects and schedule services in a time interval configuration mode, the local relational database receives collected data transmitted back by the local scheduling service, the local scheduling service provided by the local server regularly sends an Http request to the cloud WebAPI of the cloud service end according to the local configuration time interval, the returned collected index data are inserted into the local relational database, the index data provided by the cloud service end are PV numbers, abnormal constants and active user numbers, the received parameters are time period parameters and index type parameters, the file database ElasticSearch groups PV and abnormal details of users through aggregation and dates to obtain PV numbers, abnormal constants and user details required by the users, and the active scheduling service realizes a Quartz scheduling task based on an open source scheduling user database.
2. A big data summarization scheduling service realization method is characterized in that: the method comprises the steps of grouping aggregation query of big data, gathering a cloud WebAPI and local scheduling service into a local relational database, and specifically comprises the following steps:
s1, a local server starts a local scheduling service, an Http request is sent to a cloud end WebAPI of a cloud service end in a timing mode, and parameters of a salt value and an MD5 hash value are added to the Http request for identification safety;
s2, according to the Http request, the cloud service side performs security judgment on the returned data, and performs data acquisition after the security is confirmed;
s3, according to the safety judgment in the step S2 and the data type of the Web API request, the cloud server calls a file database ElasticSearch to aggregate and group the data concerned by the user;
s4, the cloud server returns index data, and the summarized data are grouped and aggregated according to aggregation requirements and access conventions, wherein the aggregation requirements comprise time grouping, enterprise grouping or user grouping, and the access conventions comprise PV numbers, abnormal constants or user numbers;
and S5, the local server receives the index data returned by the cloud server and stores the summarized index data, the local server receives the index data returned by the cloud server, the returned JSON data is deserialized into a data object list which needs to be stored in a local relational database, and corresponding index data are inserted into the relational database.
CN201910521428.3A 2019-06-17 2019-06-17 System and method for realizing big data summarizing and scheduling service Active CN110232044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910521428.3A CN110232044B (en) 2019-06-17 2019-06-17 System and method for realizing big data summarizing and scheduling service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910521428.3A CN110232044B (en) 2019-06-17 2019-06-17 System and method for realizing big data summarizing and scheduling service

Publications (2)

Publication Number Publication Date
CN110232044A CN110232044A (en) 2019-09-13
CN110232044B true CN110232044B (en) 2023-03-28

Family

ID=67859989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910521428.3A Active CN110232044B (en) 2019-06-17 2019-06-17 System and method for realizing big data summarizing and scheduling service

Country Status (1)

Country Link
CN (1) CN110232044B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110865608A (en) * 2019-11-21 2020-03-06 武夷学院 Reconfigurable manufacturing system
CN113630238B (en) * 2021-08-10 2024-02-23 中国工商银行股份有限公司 User request permission method and device based on password confusion
CN114338812B (en) * 2022-01-07 2024-04-05 德微电技术(深圳)有限公司 Equipment networking control adjusting system
CN114564455B (en) * 2022-02-25 2024-01-19 苏州浪潮智能科技有限公司 Data set display method, device and equipment of distributed system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139328A1 (en) * 2011-04-14 2012-10-18 中兴通讯股份有限公司 Cache server system and application method thereof, cache client, and cache server
CN107958080A (en) * 2017-12-14 2018-04-24 上海特易信息科技有限公司 A kind of big data report processing method based on ElasticSearch
CN108874524A (en) * 2018-06-21 2018-11-23 山东浪潮商用系统有限公司 Big data distributed task dispatching system
CN109325047A (en) * 2018-11-22 2019-02-12 北京明朝万达科技股份有限公司 A kind of interactive mode ElasticSearch depth paging query method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139328A1 (en) * 2011-04-14 2012-10-18 中兴通讯股份有限公司 Cache server system and application method thereof, cache client, and cache server
CN107958080A (en) * 2017-12-14 2018-04-24 上海特易信息科技有限公司 A kind of big data report processing method based on ElasticSearch
CN108874524A (en) * 2018-06-21 2018-11-23 山东浪潮商用系统有限公司 Big data distributed task dispatching system
CN109325047A (en) * 2018-11-22 2019-02-12 北京明朝万达科技股份有限公司 A kind of interactive mode ElasticSearch depth paging query method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于大数据的统一监控系统研究;吕霞;《计算机产品与流通》;20171115(第11期);全文 *

Also Published As

Publication number Publication date
CN110232044A (en) 2019-09-13

Similar Documents

Publication Publication Date Title
CN110232044B (en) System and method for realizing big data summarizing and scheduling service
US11720537B2 (en) Bucket merging for a data intake and query system using size thresholds
US20230315785A1 (en) Processing data using containerized nodes in a containerized scalable environment
US11567960B2 (en) Isolated execution environment system monitoring
US11860874B2 (en) Multi-partitioning data for combination operations
US11003714B1 (en) Search node and bucket identification using a search node catalog and a data store catalog
US11106734B1 (en) Query execution using containerized state-free search nodes in a containerized scalable environment
US10776355B1 (en) Managing, storing, and caching query results and partial query results for combination with additional query results
US11663212B2 (en) Identifying configuration parameters for a query using a metadata catalog
US11269939B1 (en) Iterative message-based data processing including streaming analytics
US10984044B1 (en) Identifying buckets for query execution using a catalog of buckets stored in a remote shared storage system
US11250056B1 (en) Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system
US11327992B1 (en) Authenticating a user to access a data intake and query system
US11157498B1 (en) Query generation using a dataset association record of a metadata catalog
US11157497B1 (en) Dynamically assigning a search head and search nodes for a query
US11275733B1 (en) Mapping search nodes to a search head using a tenant identifier
US11657057B2 (en) Revising catalog metadata based on parsing queries
US11416465B1 (en) Processing data associated with different tenant identifiers
US11526504B1 (en) Search-time field extraction in a data intake and query system
US11636116B2 (en) User interface for customizing data streams
US11567993B1 (en) Copying buckets from a remote shared storage system to memory associated with a search node for query execution
US11550847B1 (en) Hashing bucket identifiers to identify search nodes for efficient query execution
US11562023B1 (en) Merging buckets in a data intake and query system
US11392578B1 (en) Automatically generating metadata for a metadata catalog based on detected changes to the metadata catalog
US11620336B1 (en) Managing and storing buckets to a remote shared storage system based on a collective bucket size

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230308

Address after: 250000 Langchao Road, Jinan, Shandong

Applicant after: Inspur Genersoft Co.,Ltd.

Address before: 250100 No. 2877 Kehang Road, Sun Village Town, Jinan High-tech District, Shandong Province

Applicant before: SHANDONG INSPUR GENESOFT INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant