CN114579097A - Cloud native data API construction method based on single data stream - Google Patents
Cloud native data API construction method based on single data stream Download PDFInfo
- Publication number
- CN114579097A CN114579097A CN202210244542.8A CN202210244542A CN114579097A CN 114579097 A CN114579097 A CN 114579097A CN 202210244542 A CN202210244542 A CN 202210244542A CN 114579097 A CN114579097 A CN 114579097A
- Authority
- CN
- China
- Prior art keywords
- data
- api
- framework
- construction method
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000013500 data storage Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 8
- 238000002955 isolation Methods 0.000 claims abstract description 7
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 21
- 238000007726 management method Methods 0.000 claims description 16
- 238000000586 desensitisation Methods 0.000 claims description 12
- 238000011161 development Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 6
- 230000008676 import Effects 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 6
- 238000013480 data collection Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/20—Software design
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of networks, in particular to a cloud native data AP I construction method based on single data flow, which comprises the following steps: 10: building a multi-source heterogeneous data exchange framework; 20: building a flow and batch integrated data processing framework; 30: data storage is carried out by adopting an Apache Hud i data lake; 40: establishing an AP I based on a FaaS platform; 50: the method and the device can deal with most data query requests by adopting Apache Hud I data lake storage and an OLAP query engine based on an MPP framework, can effectively meet the data requirements of various users compared with the majority of currently used Restfu l interfaces, and can effectively solve the problems that all data AP I cannot be used and fault isolation cannot be realized when the service is broken down due to abnormal AP I because the AP I is built based on a FaaS platform.
Description
Technical Field
The invention relates to the technical field of networks, in particular to a cloud native data API construction method based on single data flow.
Background
In the process of data analysis and utilization, the data, the analysis model based on the data, and the data application based on the data and the analysis model have great value of open sharing. Traditional open data sharing means are data exports, such as to local disks, FTP servers, distributed file systems, and the like. The method is suitable for a temporary and large-volume data exchange scene, but the platform can lose any information collection capability of exporting data, cannot collect and audit information of data users, cannot export models and data applications, and greatly limits the functional boundary of data service. Therefore, in the scene of creating application ecology, an interface, particularly a Restful interface becomes a more popular service providing form, data, a model and application can be opened through the Restful interface, and a platform can also acquire basic information of a caller when a user calls the interface, so that authority management and flow concurrent control are facilitated, and better and more stable data service is provided.
However, the Restful interface supply of many manufacturers at present needs that customers clearly make demands in advance, then research and develop the force input, and the effort input is directly completed when the platform is delivered, and the mode is obviously not flexible and has large input. For the data API, some open source technologies and platform manufacturers may implement an explicit data API customization function, so that a user may define data requirements, and the platform may provide data open services in the form of the API. However, most of the current technologies use a set of service to operate all interfaces, and cannot dynamically adjust resources for each interface in a fine-grained manner, and when a service is crashed due to abnormality of some APIs, all data APIs cannot be used, and fault isolation cannot be achieved.
Therefore, a cloud native data API construction method based on single data streams is needed to improve the above problems.
Disclosure of Invention
The invention aims to provide a cloud native data API construction method based on a single data stream, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a cloud native data API construction method based on single data flow comprises the following steps:
10: multi-source heterogeneous data exchange framework building
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: API building based on a FaaS platform;
50: and performing data query based on the Presto platform.
As a preferred embodiment of the present invention, the step 30 may also use an Alluxio storage system to reduce I/O overhead.
As a preferred embodiment of the present invention, the step 10 comprises the steps of:
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized according to the supporting condition of the underlying technology;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relation and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
As a preferred embodiment of the present invention, the specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, wherein the UDF data processing function written by the user is uploaded to the platform for calling a processing task.
As a preferred scheme of the present invention, step 201 supports data processing through SQL language, Spark program, and Flink program, and under the condition supported by the bottom layer framework, the same processing task code can be switched between a stream operation mode and a batch operation mode, and step 202 also supports functions of scheduling at regular time, relying on scheduling, and scheduling on-line and off-line of tasks, so as to form a processing task workflow, and simultaneously supports an out-of-time warning function.
As a preferred embodiment of the present invention, the step 40 comprises the following steps:
401: the system comprises a data open interface engine based on the FaaS, a data ad-hoc query interface engine based on the FaaS technology, a container cluster corresponding to each ad-hoc query service, and unified interface access, load balancing and fault isolation;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: SLA-based storage scheduling.
In step 404, for different data storage and different SLAs of the APIs that can be provided by the query framework, the framework can provide different types of data storage, and respectively correspond to the data APIs of different SLA types, and predict whether the new data API SLA can be satisfied according to the current data API call condition and the data integration speed, thereby expanding the capacity and reducing the capacity of the underlying storage.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, by adopting the most energy-efficient data storage and query mode (an OLAP query engine based on MPP architecture of Apache Hudi data lake + Presto), Alluxio can be used to reduce I/O overhead if necessary, most data query requests can be dealt with, compared with the majority of currently used Restful interfaces, the data requirements of various users can be effectively met, and the API is built based on a FaaS platform, so that the problems that all data APIs cannot be used and fault isolation cannot be realized when the service is broken down due to API abnormality can be effectively solved.
Drawings
FIG. 1 is a block flow diagram of the API construction of the present invention;
FIG. 2 is a block diagram of a multi-source heterogeneous data exchange framework building process according to the present invention;
FIG. 3 is a block diagram of a flow of building a batch-flow integrated data processing framework according to the present invention;
FIG. 4 is a block diagram of an API establishment flow based on the FaaS platform.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without any creative work based on the embodiments of the present invention belong to the protection scope of the present invention.
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are presented. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example, referring to fig. 1-4,
a cloud native data API construction method based on single data flow comprises the following steps:
10: building a multi-source heterogeneous data exchange framework; can provide a data exchange technical framework with high abstraction and flexible and extensible functions, support the functions of data acquisition, export, data storage support conversion during open sharing and the like,
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: the method comprises the steps that API building based on a FaaS platform is realized, data in the platform can be exported by using a multi-source heterogeneous data exchange framework, and an API can also be directly generated to be called by other systems or pushed to a downstream business system by using a message queue;
50: based on the Presto platform for data query, the Alluxio storage system may also be used in step 30 to reduce I/O overhead.
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized by themselves according to the underlying technical support condition, for example, in the task running time, immediate execution, timing execution, cyclification and streaming execution can be supported; in the new and old data processing strategy, the strategy of full replacement, neglecting updating, storing as a new data version and the like can be supported; in the task operation configuration, different task executors (a single machine single thread, a spark cluster, a flash cluster and the like), speed limitation, breakpoint transmission continuation and the like can be supported;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relations and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
The specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, namely uploading a UDF data processing function written by a user to a platform for calling a processing task.
Step 201 supports data processing through SQL language, Spark program and Flink program, under the condition supported by the bottom layer framework, the same processing task code can be switched between a flow operation mode and a batch operation mode, and step 202 also supports functions of timing scheduling, dependence scheduling and on-line and off-line of scheduling tasks, forms a processing task workflow and supports an overtime early warning function.
401: the data open interface engine based on the FaaS technology is a data ad-hoc query interface engine based on the FaaS technology, each ad-hoc query service corresponds to one container cluster, and unified interface access, load balancing and fault isolation are provided, so that the problems that when service breakdown is caused by certain API abnormality existing in a Restful interface, all data APIs cannot be used and fault isolation cannot be realized can be effectively solved;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: the storage scheduling based on the SLA is different for different data storage and API that the query framework can provide, the framework can provide different types of data storage, respectively corresponds to the data API of different SLA types, and simultaneously predicts whether the new data API SLA can be met according to the current data API calling condition and the data integration speed, so as to expand capacity and reduce capacity of the bottom storage.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A cloud native data API construction method based on single data flow comprises the following steps:
10: building a multi-source heterogeneous data exchange frame;
20: building a flow and batch integrated data processing framework;
30: adopting Apache Hudi data lake to store data;
40: API building based on a FaaS platform;
50: and performing data query based on the Presto platform.
2. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 30 may also employ an Alluxio storage system to reduce I/O overhead.
3. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 10 comprises the following steps:
101: data source abstraction, reading and writing general data sources such as JDBC, file system, message queue and the like, providing a development frame and an integration method, and facilitating users to develop other data source drivers by themselves:
102: exchange behavior abstraction: for each data source, user-defined behaviors such as an exchange task running time strategy, a new and old data writing strategy, a dirty data filtering strategy, task running configuration and the like are abstracted, and different data sources can be realized according to the supporting condition of the underlying technology;
103: external metadata import: if the data source side stores metadata of the imported data, such as field remarks, main foreign key relations and the like, providing a development framework to realize the import function of the metadata;
104: the functions of paged data source management, task monitoring and alarming, data collection cataloging and data set relationship management are supported, and the function of expanding a management console by using a low-code development framework is supported.
4. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the specific steps of step 20 include the following:
201: constructing a flow batch integrated data processing task;
202: scheduling a processing task;
203: processing data is managed hierarchically;
204: and (4) UDF management, namely uploading a UDF data processing function written by a user to a platform for calling a processing task.
5. The cloud native data API construction method based on the single data stream according to claim 4, wherein: the step 201 supports data processing through an SQL language, a Spark program and a Flink program, under the condition that the bottom layer framework supports, the same processing task code can be switched between a flow operation mode and a batch operation mode, the step 202 also supports functions of timing scheduling, dependence scheduling and on-line and off-line of scheduling tasks, a processing task workflow is formed, and meanwhile, an overtime early warning function is supported.
6. The cloud native data API construction method based on the single data stream according to claim 1, characterized in that: the step 40 comprises the steps of:
401: the system comprises a data open interface engine based on the FaaS, a data ad-hoc query interface engine based on the FaaS technology, a container cluster corresponding to each ad-hoc query service, and unified interface access, load balancing and fault isolation;
402: the data pushing service pushes data to a user in a message queue form;
403: data desensitization management, namely configuring desensitization rules of data objects in data services according to the authority of a data service caller, and providing a character filling type desensitization mode except common field desensitization modes such as an identity card number, a mobile phone number and the like at present;
404: SLA-based storage scheduling.
7. The cloud native data API construction method based on the single data stream according to claim 6, characterized in that: in step 404, for different data storage and different SLAs of the APIs that can be provided by the query framework, the framework can provide different types of data storage, and respectively correspond to the data APIs of different SLA types, and predict whether the new data API SLA can be satisfied according to the current data API call condition and the data integration speed, thereby expanding the capacity and reducing the capacity of the underlying storage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210244542.8A CN114579097A (en) | 2022-03-14 | 2022-03-14 | Cloud native data API construction method based on single data stream |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210244542.8A CN114579097A (en) | 2022-03-14 | 2022-03-14 | Cloud native data API construction method based on single data stream |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114579097A true CN114579097A (en) | 2022-06-03 |
Family
ID=81779931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210244542.8A Pending CN114579097A (en) | 2022-03-14 | 2022-03-14 | Cloud native data API construction method based on single data stream |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114579097A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117370315A (en) * | 2023-12-04 | 2024-01-09 | 成都数之联科技股份有限公司 | Multi-type data source acquisition and warehousing method, device, equipment and medium |
-
2022
- 2022-03-14 CN CN202210244542.8A patent/CN114579097A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117370315A (en) * | 2023-12-04 | 2024-01-09 | 成都数之联科技股份有限公司 | Multi-type data source acquisition and warehousing method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111061788B (en) | Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof | |
CN102231869B (en) | Realization method for refinement operation system architecture of valued-added service | |
CN107070890A (en) | Flow data processing device and communication network major clique system in a kind of communication network major clique system | |
US20180225344A1 (en) | Database access control method and apparatus | |
CN107103064B (en) | Data statistical method and device | |
CN104679595B (en) | A kind of application oriented IaaS layers of dynamic resource allocation method | |
CN114443435A (en) | Container micro-service oriented performance monitoring alarm method and alarm system | |
CN106156047B (en) | A kind of SNAPSHOT INFO processing method and processing device | |
CN103365971A (en) | Mass data access processing system based on cloud computing | |
CN110532074A (en) | A kind of method for scheduling task and system of multi-tenant Mode S aaS service cluster environment | |
CN109885642B (en) | Hierarchical storage method and device for full-text retrieval | |
EP3817339A2 (en) | Method and system for management of an artificial intelligence development platform | |
CN102354296A (en) | Monitoring system and method capable of expanding monitoring resources | |
CN108921728A (en) | Distributed real-time database system based on power network dispatching system | |
CN109597837A (en) | Storage method, querying method and the relevant device of time series data | |
CN104166661A (en) | Data storage system and method | |
CN114579097A (en) | Cloud native data API construction method based on single data stream | |
CN110113406A (en) | Based on distributed calculating service cluster frame | |
CN115033646A (en) | Method for constructing real-time warehouse system based on Flink and Doris | |
CN109977145A (en) | A kind of database auto-partition management method and system based on horizontal partitioning | |
CN109597825A (en) | Regulation engine call method, device, equipment and computer readable storage medium | |
CN116431635A (en) | Lake and warehouse integrated-based power distribution Internet of things data real-time processing system and method | |
US20230229461A1 (en) | Correlation engine and policy manager (cpe), method and computer program product | |
CN115357433A (en) | Database backup method, device, equipment and storage medium under container environment | |
CN109150593A (en) | The management method and device of resource in cloud data system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220603 |