CN113111097A - Method for realizing high-speed query of ocean data by using distributed database technology - Google Patents

Method for realizing high-speed query of ocean data by using distributed database technology Download PDF

Info

Publication number
CN113111097A
CN113111097A CN202110516187.0A CN202110516187A CN113111097A CN 113111097 A CN113111097 A CN 113111097A CN 202110516187 A CN202110516187 A CN 202110516187A CN 113111097 A CN113111097 A CN 113111097A
Authority
CN
China
Prior art keywords
query
model
service
result
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110516187.0A
Other languages
Chinese (zh)
Inventor
韦广昊
宋晓
韩璐遥
梁建峰
刘志杰
韩春花
李维禄
陈斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NATIONAL MARINE DATA AND INFORMATION SERVICE
Original Assignee
NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NATIONAL MARINE DATA AND INFORMATION SERVICE filed Critical NATIONAL MARINE DATA AND INFORMATION SERVICE
Priority to CN202110516187.0A priority Critical patent/CN113111097A/en
Publication of CN113111097A publication Critical patent/CN113111097A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention provides a method for realizing high-speed query of ocean data by using a distributed database technology, which comprises the following steps: s1, constructing a marine special query algorithm model taking MLlib as a prototype according to an In-database Analysis technology, and associating the query algorithm model with a query service requirement to obtain a service model; s2, when receiving the query request, associating the query request with a corresponding service model, and communicating the service model with the query algorithm model by adopting an asynchronous message mode to complete the query to obtain a query result; and S3, splitting the query result set and performing distributed output according to the small unit data result. The method disclosed by the invention performs maximum force fusion on the ocean professional algorithm model and the distributed database technology, and realizes the second-level response efficiency of the model result set.

Description

Method for realizing high-speed query of ocean data by using distributed database technology
Technical Field
The invention belongs to the technical field of database query, and particularly relates to a method for realizing high-speed query of ocean data by using a distributed database technology.
Background
Due to the characteristics of wide distribution, complex influence factors, uncontrollable change process and the like of the marine environment, along with the continuous superposition of scientific and technical development, the phenomena of multiple types of sensing equipment, complex resource system, huge subject coverage and the like appear. Therefore, for comprehensive utilization of marine environment data such as life cycle tracing, data value improvement, difference analysis, accurate query and the like of marine data, modern information technologies such as cloud computing, virtualization, big data, intelligent mining analysis and the like must be reasonably utilized, an ecological chain for efficient integrated analysis and utilization of value data and marine environment information resource data-information-knowledge-value is created from massive, multi-source, complex and multi-type marine environment data, and technical, methodical and platform management capabilities of the marine environment information resource are remarkably improved.
At present, aiming at the technical direction of distributed concurrent query by a distributed database system, the improvement of query performance is broken through from the multi-node support of system deployment and the cluster scale, and the application of the distributed concurrent query in the ocean field shows the performance reduction problem under a complex computing mode, and the problems are as follows: 1. the ocean data query has the comprehensive scheduling characteristics of multi-disciplinary and multi-type data, so a large amount of professional and complex computing requirements are met in a scene of ocean data query on the basis of an ocean comprehensive library. 2. The complex professional computing pressure brings great influence on the response capability of the database, and the efficiency of a large amount of interactive computing is difficult to meet the requirement of high query speed of a complex application scene.
Disclosure of Invention
In view of this, the present invention is directed to a method for implementing high-speed query of marine data by using a distributed database technology, so as to improve query efficiency.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the method mainly aims to improve the efficiency of ocean service query, so that fusion is carried out on the aspect of a professional algorithm model of a scene for ocean data query, and the method realizes the high efficiency of query.
The method for realizing high-speed query of ocean data by utilizing the distributed database technology comprises the following steps:
s1, constructing a marine special query algorithm model taking MLlib as a prototype according to an In-database Analysis technology, and associating the query algorithm model with a query service requirement to obtain a service model;
s2, when receiving the query request, associating the query request with a corresponding service model, and communicating the service model with the query algorithm model by adopting an asynchronous message mode to complete the query to obtain a query result;
and S3, splitting the query result set and performing distributed output according to the small unit data result.
Further, in step S1, the business model directly associates the marine comprehensive library with the query algorithm model according to the business logic when it is constructed, so as to form a mapping relationship.
Further, in step S1, the query requirement of the ocean data may be a real-time query or a timing query, and the result of the timing query is stored in the database; when a service query request is received, if the service query request is a timing query, directly communicating a database to push results; if the query is real-time query, the query needs to be carried out to obtain the latest result, and the result is stored in the database for later use after the temporary table is directly split for result feedback.
Further, in step S2, after the message of the query request is issued, the query algorithm model responds to the received message, and at the same time, the message communication does not wait for the response, but performs multiple concurrent issuances of other messages.
Further, in step S3, the method performs rule splitting on the numerical intervals according to the intervals; and splitting the regions with different regions according to region rules.
Compared with the prior art, the method has the following advantages:
the method disclosed by the invention performs maximum force fusion on the ocean professional algorithm model and the distributed database technology, and realizes the second-level response efficiency of the model result set.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a block diagram of an overall architecture of a method for implementing high-speed query of marine data using distributed database technology according to an embodiment of the present invention;
FIG. 2 is a flowchart of a query algorithm model process according to an embodiment of the present invention;
fig. 3 is a flowchart of ocean data query according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The multi-dimensional query of marine environmental data is a main scene of marine business application, and the problem of low query response efficiency is mainly focused on the complexity and the specialty of the marine environmental data, and data interaction with a professional algorithm system is required to form a final query result, while the interaction of the system consumes a large amount of resources, and the optimal performance improvement target cannot be achieved.
The invention carries out the maximum force fusion of the ocean professional algorithm model and the distributed database technology, and realizes the second-level response efficiency of the model result set.
According to the invention, through constructing a marine query algorithm model module, on the basis of a distributed data processing technology, various subject resources deposited by a marine environment comprehensive library platform are mobilized In a multi-dimensional manner, a big data mining methodology and an In-Database machine learning algorithm are integrated to deposit a big data query capability, the efficient mapping of a marine environment service scene demand data result set is realized, and an efficient response innovation form of a query scene is formed.
The method for realizing high-speed query of ocean data by utilizing the distributed database technology comprises the following steps:
step 1, constructing a marine special query algorithm model taking MLlib as a prototype according to an In-database Analysis technology, and defining the model as HYMLlib; associating the query algorithm model with a query service requirement to obtain a service model;
step 2, when receiving a query request, associating the query request with a corresponding service model, and communicating the service model with a query algorithm model by adopting an asynchronous message mode to complete query to obtain a query result;
after the query request message is issued, the query algorithm model responds to the received message, and simultaneously message communication can be performed without waiting for response, and multiple concurrent issuing of the 2 nd and 3 rd 3 … … th messages is performed, so that the problem of large resource occupation caused by a complex message sending, confirming and retransmitting message reciprocating mode is avoided, and the communication efficiency is improved.
When the inquiry service multi-message is issued respectively, asynchronous communication is carried out to the service model, the service model responds respectively according to the inquiry requirement and responds respectively according to different calculation and splitting speeds, and the communication efficiency of other messages is reduced without occupying channels due to waiting for response in message communication.
And 3, splitting the query result set, and performing distributed output according to the small unit data result.
For example, the result set fed back by the association impact query may be split according to the range of association values, for example, the split is performed according to four ranges of >1, 0.7-1, 0.45-0.6 and < 0.45.
Specifically, as shown in fig. 2, step 1 includes the following steps:
1. according to an In-database Analysis technology, constructing a marine special query algorithm model taking MLlib as a prototype, and defining the model as HYMLlib;
fusing a corresponding machine learning algorithm in the query algorithm model according to specific marine query service requirements to obtain a query algorithm model;
for example, through application of an algorithm of 'Apriori frequent item set-association rule' in a query algorithm model, a 'subject association degree influence analysis model' is trained according to marine business requirements to form an association analysis business model so as to support subject association query class data set output.
The ocean service query requirement can be real-time query or timing query, the result of the timing query is stored in a database, for example, algorithms such as Logistic regression and Apriori frequent item set-association rule are respectively associated according to the requirement of a service analysis model, the data of the comprehensive service library and the algorithms are bound and trained to obtain a query algorithm model, and the query result set falls into the comprehensive service library to form an independent analysis algorithm result library.
When the service is inquired, the database can be directly communicated to push the result if the service is timed, if the service is calculated by the communication between the database and the algorithm in real time, the latest result is formed, and the temporary table is directly split to feed back the result and then is put into a warehouse for standby.
2. Performing service association on the query algorithm model, query requirements and a data source to construct a service model;
and the construction of the business model is the only link for responding to the query requirement. The query requirement is directly communicated to the business model, and the business model is a logic model of a marine business theme base constructed according to the query business requirement and is business logic for decomposing and responding to the query requirement.
And directly associating the ocean comprehensive library with the algorithm model according to the business logic of the business model during construction, and forming a mapping relation with the ocean comprehensive library and the algorithm model.
For example, when the association of each subject of ocean business influences the query of the timed monthly report, the query command is firstly communicated to the corresponding business model, the business model calls a monthly report result set which is fixedly calculated every month by the ocean data comprehensive library and the Apriori frequent item set-association rule algorithm, the result set is split as required, and then the response business model is distributed to form a response foreground query interface after caching.
3. And realizing the operation of the corresponding query algorithm model through a UDF/UDAF programming interface form.
And (4) performing clustering, classification and other drills and precipitating a result set at regular time or in real time according to an updating rule by inquiring the database data associated with the algorithm model.
In step 2, a multilink asynchronous message scheduling mode is adopted, a multi-channel CPU resource scheduling and asynchronous message communication mode is focused, network overhead is saved, and the limit of marine environment data query response is upgraded.
And calling the multi-data concurrency capability based on a database distribution strategy, and starting from the requirement of promoting the quick response of the ocean query service.
In step 3, the query result set is split, so that the distributed computing mode of the small data table is achieved, the requirement of avoiding a large data set and saving network overhead is met. Specifically, the following rules may be set:
1) and forming a model according to the service scene requirements and the query algorithm according to configuration rules, wherein the configuration rules comprise classification attributes (classifying services or algorithms), priority attributes, model algorithm labels and the like.
2) And the query requirements of different service scenes are subjected to priority sequencing according to a configuration rule.
3) And splitting table distribution according to the result set configuration rule to form a result set small table mode.
For example, for a numerical interval, regular splitting is performed according to the interval; and (4) splitting the regions with differences according to region rules.
4) And matching the distribution strategies according to the classification attributes of the services.
A. Avoiding random distribution strategy
The invention avoids the default random distribution mode, and the model result data table is distributed by adopting a customized distribution strategy according to classification, thereby realizing the quick response capability on the service.
B. View-avoiding virtual table distribution strategy
The method avoids a common view virtual table mode, avoids the consumption of view conversion SQL statements during query, and improves the data query response capability by adopting a materialized view construction strategy.
Examples
1. And dividing the query service according to the marine business requirements, wherein the division comprises full query and customized query.
Wherein the content of the first and second substances,
full query requirement: the full query is used for querying the data volume, time and distribution condition of the elements. The correlation of the elements to be queried can be filtered by different conditions. Such as: by source, time range, space range, voyage, equipment code, survey unit, national and international, etc.
Customizing query requirements: the customized query is used for querying the data quantity, time and distribution condition of the element data table. The user can be enabled to autonomously select different conditions of each field of the table to query.
2. And performing flow planning according to the query business requirements of the ocean data, as shown in fig. 3.
3. And carrying out data table association according to the planned query service requirements to form a data model.
4. Constructing a marine query algorithm model and establishing a marine query algorithm model,
the example constructs a data monthly report model of timing query of Argo, GTSPP, WOD, ICODAS, GTS, NDBC, DBCP, NEAR-G00S, GLOSS, IOC water level, American ocean station, NGDC, IODP and the like.
According to different data sources, the data receiving conditions are reported monthly, so that a data source analysis monthly report model needs to be constructed, which is a service model.
When the business is inquired, the inquiry requirement is automatically associated to a 'data source analysis monthly report model', the business model establishes business logic when the business model is built according to the data content to be analyzed, the association of a data table and the association of an algorithm model are carried out according to the business logic, and the business logic and the algorithm model form calculation results and fall on the ground of the analysis report data of each month.
By means of a distributed database technology, a timed monthly report model is constructed, design input and output views are constructed in a materialized view mode, and a physical table is formed.
And outputting data according to the query requirement.
5. And according to the monthly report output requirement, algorithm construction such as K-Means clustering is fused, clustering mining is carried out according to history and related data, and a query result set is formed.
6. And matching the result set data with the corresponding distribution strategy according to the configuration rule, and automatically distributing the data to form a small table mode.
And 7, outputting the query result.
As shown in fig. 1, in this embodiment, a query analysis system sends a data analysis request to a database, the database performs distributed parallel computation to obtain a query analysis result, and the small data analysis result is returned to a terminal.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. The method for realizing high-speed query of ocean data by utilizing the distributed database technology is characterized by comprising the following steps:
s1, constructing a marine special query algorithm model taking MLlib as a prototype according to an In-database Analysis technology, and associating the query algorithm model with a query service requirement to obtain a service model;
s2, when receiving the query request, associating the query request with a corresponding service model, and communicating the service model with the query algorithm model by adopting an asynchronous message mode to complete the query to obtain a query result;
and S3, splitting the query result set and performing distributed output according to the small unit data result.
2. The method of claim 1, wherein: in step S1, the business model directly associates the marine comprehensive library with the query algorithm model according to the business logic when it is constructed, so as to form a mapping relationship.
3. The method of claim 1, wherein: in step S1, the query requirement of the ocean data may be a real-time query or a timing query, and the result of the timing query is stored in the database;
when a service query request is received, if the service query request is a timing query, directly communicating a database to push results; if the query is real-time query, the query needs to be carried out to obtain the latest result, and the result is stored in the database for later use after the temporary table is directly split for result feedback.
4. The method of claim 1, wherein: in step S2, after the message of the query request is published, the query algorithm model responds to the received message, and at the same time, the message communication does not wait for the response, but performs multiple concurrent publications of other messages.
5. The method of claim 1, wherein: in step S3, rule splitting is performed on the numerical intervals according to the intervals; and splitting the regions with different regions according to region rules.
CN202110516187.0A 2021-05-12 2021-05-12 Method for realizing high-speed query of ocean data by using distributed database technology Pending CN113111097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110516187.0A CN113111097A (en) 2021-05-12 2021-05-12 Method for realizing high-speed query of ocean data by using distributed database technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110516187.0A CN113111097A (en) 2021-05-12 2021-05-12 Method for realizing high-speed query of ocean data by using distributed database technology

Publications (1)

Publication Number Publication Date
CN113111097A true CN113111097A (en) 2021-07-13

Family

ID=76722387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110516187.0A Pending CN113111097A (en) 2021-05-12 2021-05-12 Method for realizing high-speed query of ocean data by using distributed database technology

Country Status (1)

Country Link
CN (1) CN113111097A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218263A (en) * 2022-02-23 2022-03-22 浙江一山智慧医疗研究有限公司 Automatic creation method of materialized view and rapid query method based on materialized view

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514284A (en) * 2013-09-29 2014-01-15 方正国际软件有限公司 Data display system and data display method
US20150370897A1 (en) * 2014-06-18 2015-12-24 Alibaba Group Holding Limited Data query method and apparatus
CN105912624A (en) * 2016-04-07 2016-08-31 北京中安智达科技有限公司 Query method for distributed deployed heterogeneous database
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514284A (en) * 2013-09-29 2014-01-15 方正国际软件有限公司 Data display system and data display method
US20150370897A1 (en) * 2014-06-18 2015-12-24 Alibaba Group Holding Limited Data query method and apparatus
CN105912624A (en) * 2016-04-07 2016-08-31 北京中安智达科技有限公司 Query method for distributed deployed heterogeneous database
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高明: ""海洋水文多要素监测系统研究"", 《2020年(第八届)中国水利信息化技术论坛论文集》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218263A (en) * 2022-02-23 2022-03-22 浙江一山智慧医疗研究有限公司 Automatic creation method of materialized view and rapid query method based on materialized view
CN114218263B (en) * 2022-02-23 2022-05-13 浙江一山智慧医疗研究有限公司 Materialized view automatic creation method and materialized view based quick query method

Similar Documents

Publication Publication Date Title
CN102236581B (en) Mapping reduction method and system thereof for data center
CN110058573B (en) Huff and puff flexible intelligent assembly logistics path planning platform
CN110399373A (en) A kind of block chain account book storage system, storage querying method and delet method
CN110046865B (en) Distributed inventory scheduling method
CN102521246A (en) Cloud data warehouse system
CN103019728A (en) Effective complex report parsing engine and parsing method thereof
CN101739292A (en) Application characteristic-based isomeric group operation self-adapting dispatching method and system
CN104392010A (en) Subgraph matching query method
CN106371851A (en) Activiti-based business flow management system
CN103701894A (en) Method and system for dispatching dynamic resource
CN110647398A (en) Intersection control task scheduling method facing edge calculation and based on task criticality and timeliness
CN104615684A (en) Mass data communication concurrent processing method and system
CN111562966A (en) Resource arrangement method of man-machine-object fusion cloud computing platform
CN109522117A (en) Data dispatch system on a kind of chain towards under isomerous environment
CN113111097A (en) Method for realizing high-speed query of ocean data by using distributed database technology
CN114710571A (en) Data packet processing system
CN102724290B (en) Method, device and system for getting target customer group
CN109669767A (en) A kind of task encapsulation and dispatching method and system towards polymorphic type Context-dependent
CN113469647A (en) Enterprise information integration system and business processing method
CN101452486A (en) System data management method for [inscriptions on bones or tortoise shells and apparatus thereof
CN109933568A (en) A kind of industry big data platform system and its querying method
CN107291808A (en) It is a kind of that big data sorting technique is manufactured based on semantic cloud
CN110704180B (en) Workflow scheduling method based on hybrid cloud
CN101673361A (en) Technical architecture for order distribution system
CN112527823A (en) Data processing method in intelligent building management platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713