CN114579097A - Cloud native data API construction method based on single data stream - Google Patents

Cloud native data API construction method based on single data stream Download PDF

Info

Publication number
CN114579097A
CN114579097A CN202210244542.8A CN202210244542A CN114579097A CN 114579097 A CN114579097 A CN 114579097A CN 202210244542 A CN202210244542 A CN 202210244542A CN 114579097 A CN114579097 A CN 114579097A
Authority
CN
China
Prior art keywords
data
api
framework
construction method
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210244542.8A
Other languages
Chinese (zh)
Inventor
郭晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yisi Changtian Digital Intelligent Technology Co ltd
Original Assignee
Jiangsu Yisi Changtian Digital Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Yisi Changtian Digital Intelligent Technology Co ltd filed Critical Jiangsu Yisi Changtian Digital Intelligent Technology Co ltd
Priority to CN202210244542.8A priority Critical patent/CN114579097A/en
Publication of CN114579097A publication Critical patent/CN114579097A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of networks, in particular to a cloud native data AP I construction method based on single data flow, which comprises the following steps: 10: building a multi-source heterogeneous data exchange framework; 20: building a flow and batch integrated data processing framework; 30: data storage is carried out by adopting an Apache Hud i data lake; 40: establishing an AP I based on a FaaS platform; 50: the method and the device can deal with most data query requests by adopting Apache Hud I data lake storage and an OLAP query engine based on an MPP framework, can effectively meet the data requirements of various users compared with the majority of currently used Restfu l interfaces, and can effectively solve the problems that all data AP I cannot be used and fault isolation cannot be realized when the service is broken down due to abnormal AP I because the AP I is built based on a FaaS platform.

Description

Cloud native data API construction method based on single data stream
Technical Field
The invention relates to the technical field of networks, in particular to a cloud native data API construction method based on single data flow.
Background
In the process of data analysis and utilization, the data, the analysis model based on the data, and the data application based on the data and the analysis model have great value of open sharing. Traditional open data sharing means are data exports, such as to local disks, FTP servers, distributed file systems, and the like. The method is suitable for a temporary and large-volume data exchange scene, but the platform can lose any information collection capability of exporting data, cannot collect and audit information of data users, cannot export models and data applications, and greatly limits the functional boundary of data service. Therefore, in the scene of creating application ecology, an interface, particularly a Restful interface becomes a more popular service providing form, data, a model and application can be opened through the Restful interface, and a platform can also acquire basic information of a caller when a user calls the interface, so that authority management and flow concurrent control are facilitated, and better and more stable data service is provided.
However, the Restful interface supply of many manufacturers at present needs that customers clearly make demands in advance, then research and develop the force input, and the effort input is directly completed when the platform is delivered, and the mode is obviously not flexible and has large input. For the data API, some open source technologies and platform manufacturers may implement an explicit data API customization function, so that a user may define data requirements, and the platform may provide data open services in the form of the API. However, most of the current technologies use a set of service to operate all interfaces, and cannot dynamically adjust resources for each interface in a fine-grained manner, and when a service is crashed due to abnormality of some APIs, all data APIs cannot be used, and fault isolation cannot be achieved.
Therefore, a cloud native data API construction method based on single data streams is needed to improve the above problems.
Disclosure of Invention
The invention aims to provide a cloud native data API construction method based on a single data stream, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a cloud native data API construction method based on single data flow comprises the following steps:
10: multi-source heterogeneous data exchange framework building
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: API building based on a FaaS platform;
50: and performing data query based on the Presto platform.
As a preferred embodiment of the present invention, the step 30 may also use an Alluxio storage system to reduce I/O overhead.
As a preferred embodiment of the present invention, the step 10 comprises the steps of:
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized according to the supporting condition of the underlying technology;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relation and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
As a preferred embodiment of the present invention, the specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, wherein the UDF data processing function written by the user is uploaded to the platform for calling a processing task.
As a preferred scheme of the present invention, step 201 supports data processing through SQL language, Spark program, and Flink program, and under the condition supported by the bottom layer framework, the same processing task code can be switched between a stream operation mode and a batch operation mode, and step 202 also supports functions of scheduling at regular time, relying on scheduling, and scheduling on-line and off-line of tasks, so as to form a processing task workflow, and simultaneously supports an out-of-time warning function.
As a preferred embodiment of the present invention, the step 40 comprises the following steps:
401: the system comprises a data open interface engine based on the FaaS, a data ad-hoc query interface engine based on the FaaS technology, a container cluster corresponding to each ad-hoc query service, and unified interface access, load balancing and fault isolation;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: SLA-based storage scheduling.
In step 404, for different data storage and different SLAs of the APIs that can be provided by the query framework, the framework can provide different types of data storage, and respectively correspond to the data APIs of different SLA types, and predict whether the new data API SLA can be satisfied according to the current data API call condition and the data integration speed, thereby expanding the capacity and reducing the capacity of the underlying storage.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, by adopting the most energy-efficient data storage and query mode (an OLAP query engine based on MPP architecture of Apache Hudi data lake + Presto), Alluxio can be used to reduce I/O overhead if necessary, most data query requests can be dealt with, compared with the majority of currently used Restful interfaces, the data requirements of various users can be effectively met, and the API is built based on a FaaS platform, so that the problems that all data APIs cannot be used and fault isolation cannot be realized when the service is broken down due to API abnormality can be effectively solved.
Drawings
FIG. 1 is a block flow diagram of the API construction of the present invention;
FIG. 2 is a block diagram of a multi-source heterogeneous data exchange framework building process according to the present invention;
FIG. 3 is a block diagram of a flow of building a batch-flow integrated data processing framework according to the present invention;
FIG. 4 is a block diagram of an API establishment flow based on the FaaS platform.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without any creative work based on the embodiments of the present invention belong to the protection scope of the present invention.
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are presented. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example, referring to fig. 1-4,
a cloud native data API construction method based on single data flow comprises the following steps:
10: building a multi-source heterogeneous data exchange framework; can provide a data exchange technical framework with high abstraction and flexible and extensible functions, support the functions of data acquisition, export, data storage support conversion during open sharing and the like,
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: the method comprises the steps that API building based on a FaaS platform is realized, data in the platform can be exported by using a multi-source heterogeneous data exchange framework, and an API can also be directly generated to be called by other systems or pushed to a downstream business system by using a message queue;
50: based on the Presto platform for data query, the Alluxio storage system may also be used in step 30 to reduce I/O overhead.
Step 10 comprises the steps of:
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized by themselves according to the underlying technical support condition, for example, in the task running time, immediate execution, timing execution, cyclification and streaming execution can be supported; in the new and old data processing strategy, the strategy of full replacement, neglecting updating, storing as a new data version and the like can be supported; in the task operation configuration, different task executors (a single machine single thread, a spark cluster, a flash cluster and the like), speed limitation, breakpoint transmission continuation and the like can be supported;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relations and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
The specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, namely uploading a UDF data processing function written by a user to a platform for calling a processing task.
Step 201 supports data processing through SQL language, Spark program and Flink program, under the condition supported by the bottom layer framework, the same processing task code can be switched between a flow operation mode and a batch operation mode, and step 202 also supports functions of timing scheduling, dependence scheduling and on-line and off-line of scheduling tasks, forms a processing task workflow and supports an overtime early warning function.
Step 40 comprises the steps of:
401: the data open interface engine based on the FaaS technology is a data ad-hoc query interface engine based on the FaaS technology, each ad-hoc query service corresponds to one container cluster, and unified interface access, load balancing and fault isolation are provided, so that the problems that when service breakdown is caused by certain API abnormality existing in a Restful interface, all data APIs cannot be used and fault isolation cannot be realized can be effectively solved;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: the storage scheduling based on the SLA is different for different data storage and API that the query framework can provide, the framework can provide different types of data storage, respectively corresponds to the data API of different SLA types, and simultaneously predicts whether the new data API SLA can be met according to the current data API calling condition and the data integration speed, so as to expand capacity and reduce capacity of the bottom storage.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A cloud native data API construction method based on single data flow comprises the following steps:
10: building a multi-source heterogeneous data exchange frame;
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: API building based on a FaaS platform;
50: and performing data query based on the Presto platform.
2. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 30 may also employ an Alluxio storage system to reduce I/O overhead.
3. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 10 comprises the following steps:
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized according to the supporting condition of the underlying technology;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relations and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
4. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, namely uploading a UDF data processing function written by a user to a platform for calling a processing task.
5. The cloud native data API construction method based on the single data stream according to claim 4, wherein: the step 201 supports data processing through an SQL language, a Spark program and a Flink program, under the condition that the bottom layer framework supports, the same processing task code can be switched between a flow operation mode and a batch operation mode, the step 202 also supports functions of timing scheduling, dependence scheduling and on-line and off-line of scheduling tasks, a processing task workflow is formed, and meanwhile, an overtime early warning function is supported.
6. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 40 comprises the steps of:
401: the system comprises a data open interface engine based on the FaaS, a data ad-hoc query interface engine based on the FaaS technology, a container cluster corresponding to each ad-hoc query service, and unified interface access, load balancing and fault isolation;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: SLA-based storage scheduling.
7. The cloud native data API construction method based on the single data stream according to claim 6, characterized in that: in step 404, for different data storage and different SLAs of the APIs that can be provided by the query framework, the framework can provide different types of data storage, and respectively correspond to the data APIs of different SLA types, and predict whether the new data API SLA can be satisfied according to the current data API call condition and the data integration speed, thereby expanding the capacity and reducing the capacity of the underlying storage.
CN202210244542.8A 2022-03-14 2022-03-14 Cloud native data API construction method based on single data stream Pending CN114579097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210244542.8A CN114579097A (en) 2022-03-14 2022-03-14 Cloud native data API construction method based on single data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210244542.8A CN114579097A (en) 2022-03-14 2022-03-14 Cloud native data API construction method based on single data stream

Publications (1)

Publication Number Publication Date
CN114579097A true CN114579097A (en) 2022-06-03

Family

ID=81779931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210244542.8A Pending CN114579097A (en) 2022-03-14 2022-03-14 Cloud native data API construction method based on single data stream

Country Status (1)

Country Link
CN (1) CN114579097A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370315A (en) * 2023-12-04 2024-01-09 成都数之联科技股份有限公司 Multi-type data source acquisition and warehousing method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370315A (en) * 2023-12-04 2024-01-09 成都数之联科技股份有限公司 Multi-type data source acquisition and warehousing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111061788B (en) Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof
CN102231869B (en) Realization method for refinement operation system architecture of valued-added service
CN107070890A (en) Flow data processing device and communication network major clique system in a kind of communication network major clique system
US20180225344A1 (en) Database access control method and apparatus
CN107103064B (en) Data statistical method and device
CN104679595B (en) A kind of application oriented IaaS layers of dynamic resource allocation method
CN114443435A (en) Container micro-service oriented performance monitoring alarm method and alarm system
CN106156047B (en) A kind of SNAPSHOT INFO processing method and processing device
CN103365971A (en) Mass data access processing system based on cloud computing
CN110532074A (en) A kind of method for scheduling task and system of multi-tenant Mode S aaS service cluster environment
CN109885642B (en) Hierarchical storage method and device for full-text retrieval
EP3817339A2 (en) Method and system for management of an artificial intelligence development platform
CN102354296A (en) Monitoring system and method capable of expanding monitoring resources
CN108921728A (en) Distributed real-time database system based on power network dispatching system
CN109597837A (en) Storage method, querying method and the relevant device of time series data
CN104166661A (en) Data storage system and method
CN114579097A (en) Cloud native data API construction method based on single data stream
CN110113406A (en) Based on distributed calculating service cluster frame
CN115033646A (en) Method for constructing real-time warehouse system based on Flink and Doris
CN109977145A (en) A kind of database auto-partition management method and system based on horizontal partitioning
CN109597825A (en) Regulation engine call method, device, equipment and computer readable storage medium
CN116431635A (en) Lake and warehouse integrated-based power distribution Internet of things data real-time processing system and method
US20230229461A1 (en) Correlation engine and policy manager (cpe), method and computer program product
CN115357433A (en) Database backup method, device, equipment and storage medium under container environment
CN109150593A (en) The management method and device of resource in cloud data system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220603