WO2022256969A1 - 一种通用数据抽取的系统 - Google Patents

一种通用数据抽取的系统 Download PDF

Info

Publication number
WO2022256969A1
WO2022256969A1 PCT/CN2021/098638 CN2021098638W WO2022256969A1 WO 2022256969 A1 WO2022256969 A1 WO 2022256969A1 CN 2021098638 W CN2021098638 W CN 2021098638W WO 2022256969 A1 WO2022256969 A1 WO 2022256969A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
extraction
producer
interface
request
Prior art date
Application number
PCT/CN2021/098638
Other languages
English (en)
French (fr)
Inventor
王怀亮
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202180001459.1A priority Critical patent/CN115836284A/zh
Priority to PCT/CN2021/098638 priority patent/WO2022256969A1/zh
Publication of WO2022256969A1 publication Critical patent/WO2022256969A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Definitions

  • the present disclosure relates to the field of data extraction, in particular to a general data extraction system.
  • the database As the core asset of the business system, the database has become the core asset of the enterprise and is usually not open to third parties; The business data between them cannot be exchanged.
  • a large-scale project usually needs to integrate business data (data producers) provided by different enterprises or different business departments.
  • the processes of these businesses are not interoperable and the data is not shared. This has brought great difficulties to users (data consumers) who develop large-scale projects in the analysis and utilization of data, report development, analysis and mining, etc.
  • the present disclosure provides a general data extraction system to solve the above-mentioned technical problems existing in the prior art.
  • a general data extraction system provided by the embodiment of the present disclosure is applied to the data consumer.
  • the system is based on the microservice architecture and includes a service registration center.
  • the technical solution of the method is as follows:
  • At least one configuration service one of which is configured to perform related configurations for an extraction task that extracts data from a corresponding data producer
  • At least one execution service configured to execute the extraction task and map the extracted data to a target location; wherein both the configuration service and the execution service are registered with the microservice architecture, and When communicating with the data producer, the REST interface is used to transmit data.
  • the REST interface includes:
  • the REST-based authentication interface and the REST-based data extraction interface wherein the authentication interface is configured to obtain authorization information for accessing the data producer, and the data extraction interface is configured to use the authorization information from The data producer extracts data.
  • the configuration service is further configured as:
  • the authorization request carries the user name and user password required for registration at the data consumer;
  • the configuration service is further configured as:
  • the fields contained in the source data table are mapped to the fields of the target data table of the data consumer, and a corresponding extraction task is established.
  • the configuration service is further configured as:
  • the fields included in the source data table and corresponding field types are acquired based on the sample data.
  • the configuration task is further configured as:
  • the data source of the extraction task based on the data producer and the source data table, and define a target data model corresponding to the target data table; wherein, the target data model includes data from the data producer
  • execution service is configured as:
  • the returned data is converted into data in the target data table through the extraction task for storage.
  • the REST interface includes:
  • the return value is used to represent the data returned by the data producer based on the request parameters.
  • the data format used by the request parameter and the return value is JSON format.
  • the request parameter carries the authorization request
  • the return value carries the authorization information
  • the request parameter carries the authorization information and the data extraction request
  • the return value carries the data producer based on the The data returned by the above data extraction request.
  • the definition of the data source, the configuration of the target data model, and the establishment of the mapping relationship are completed based on user operations using a graphical interface.
  • both the configuration service and the execution service use container technology, and each of the configuration service and each of the execution services runs in a corresponding container.
  • FIG. 1 is a schematic structural diagram of a general data extraction system provided by an embodiment of the present disclosure
  • FIG. 2 is a relationship diagram between a data consumer and a data producer provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the system structure used by the operation platform of the data consumer provided by the embodiment of the present disclosure
  • FIG. 4 is a first configuration schematic diagram of a graphical interface defined by a data source provided by an embodiment of the present disclosure
  • FIG. 5 is a second schematic diagram of a graphical interface configuration of a data source provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of parameter configuration of a graphical interface of a target data model provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of parameter configuration of a graphical interface of a target data table provided by an embodiment of the present disclosure
  • FIG. 8 is a graphical interface for configuring the mapping relationship between a source data table and a target data table provided by an embodiment of the present disclosure
  • FIG. 9 is a graphical interface of a configuration extraction task provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of the relationship between the configuration service and the execution service provided by the embodiment of the present disclosure.
  • Embodiments of the present disclosure provide a general data extraction system to solve the above-mentioned technical problems existing in the prior art.
  • the embodiment of the present disclosure provides a general data extraction system, which is applied to the data consumer.
  • the system is based on the microservice architecture and includes a service registration center 1.
  • the system includes:
  • At least one configuration service 2 a configuration service 2 is configured to extract data from a corresponding data producer to perform related configuration; where the related configuration includes a REST-based data structure;
  • At least one execution service 3, and one execution service 3 is configured to execute the extraction task, and map the extracted data to the target location; wherein, both the configuration service and the execution service are registered in the microservice architecture, and when communicating with the data producer, both Data is transferred using the REST interface.
  • the data producer can be the database of an external enterprise, institution, or business department, and the data consumer can be the party that needs to integrate some data in the database of the above-mentioned external enterprise, institution, or business department. Please refer to FIG. 2 for a relationship diagram between a data consumer and a data producer provided by an embodiment of the present disclosure.
  • manufacturer C as the data consumer, needs to establish an operation platform.
  • the data of the database corresponding to the operation platform needs to be extracted from the smart security database of manufacturer A and the smart energy database of manufacturer B as the data producer. .
  • FIG. 3 is a schematic structural diagram of a system used by an operation platform of a data consumer provided by an embodiment of the present disclosure.
  • the system is based on the microservice architecture.
  • This system includes a registration center cluster.
  • the center registration cluster may be composed of multiple service registration centers 1.
  • the service registration center 1 is composed of servers that provide registration services for services in the system. , the types of registration services provided by different service registration centers 1 may be different.
  • Spring Eureka (a tool with service registration and discovery capabilities) is usually used as the registration center 1, and the services contained in the system can be queried through Spring Eureka (including configuration admiration, execution services, etc.) , service registration status, availability, status, etc., so as to manage and control the registered services.
  • the system also includes a business service cluster. According to the service list obtained from the registration center cluster, it can be determined that the business service cluster includes a configuration unit service cluster and an execution unit service set.
  • the configuration unit service cluster includes configuration service A, configuration service B, and other configuration services (corresponding to other business systems of other data consumers), configuration service A extracts the required data extraction task (denoted as extraction task A) from vendor A (data producer) for related configuration, configuration service B from vendor B (data producer) Party) to extract the required data extraction task (denoted as extraction task B) for related configuration.
  • the execution service unit cluster includes execution service A corresponding to configuration service A, execution service B corresponding to configuration service B, and other execution services corresponding to other configuration services; execution service A is configured to execute the extraction generated by configuration service A task A, and map the data extracted from vendor A to the database and/or cache (target location) of vendor C (data consumer), execution service B is configured to execute the extraction task B generated by configuration service B, And map the data extracted from vendor B to the database and/or cache of vendor C (data consumer) (target location).
  • the relevant parameters required in the above-mentioned configuration services can be input by the user at the user end, and transmitted to the system through the network, and transmitted to the corresponding configuration service through the load balancing and gateway services in the system.
  • the operation platform When the operation platform extracts the data in the smart security system, it needs to send request information to the smart security system through the network, so that the intelligent security system will put the extracted data in the response information and return it to the operation platform.
  • the data consumer defines a REST interface based on Representational State Transfer (REST), and allows the data producer to also exchange data with the data consumer through the REST interface.
  • REST Representational State Transfer
  • REST can be used to standardize how the client performs data interaction with the application programming interface (Application Programming Interface, API) of the server at the Hypertext Transfer Protocol (Hypertext Transfer Protocol, HTTP) layer.
  • API Application Programming Interface
  • HTTP Hypertext Transfer Protocol
  • REST describes the data interaction rules between the client and the server in the HTTP layer; the client completes an HTTP interaction by sending an HTTP(s) request to the server and receiving the response from the server.
  • REST agrees on two important aspects, which are the method used by the HTTP request and the link of the request.
  • the data consumer is used as the client
  • the data producer is used as the server
  • the REST-based network interface uses the URL corresponding to the user list (that is, the URL of the data producer) as the link of the request, and the operation corresponding to the interface
  • the data required by the data consumer or the information provided to the data producer is provided in the request parameters, and the data returned by the data producer based on the request parameters is stored in the return value.
  • the REST-based interface includes the following components:
  • Interface used to represent the operations performed on the data producer
  • Request parameters used to characterize the parameters requested from the data producer
  • the return value is used to represent the data returned by the data producer based on the request parameters.
  • the data format used for request parameters and return values is JSON format.
  • the REST interface includes a REST-based authentication interface and a REST-based data extraction interface; where the authentication interface is configured to obtain authorization information for accessing the data producer, and the data extraction interface is configured to use the authorization information to extract data from the data producer.
  • both the authentication interface and the data extraction interface are REST interfaces, which have the same components as the REST interface. The difference is that the functions implemented by the authentication interface and the data extraction interface are different.
  • the configuration service 2 is also configured as:
  • the request parameter carries an authorization request
  • the return value carries authorization information
  • the data producer can add a secondary data producer (the corresponding URL is boe.com.cn/va/x1) through the content shown in the above table 2 (POST means adding), and the authorization request information carries the registered user
  • the password used by the name "user” is "123456", and this information is used as a request parameter based on the REST interface.
  • the data producer generates the corresponding authorization information "0a32d8de-4789-49a9-afd7-c5544894fdf5" based on the above information provided in the request parameter ", the authorization information is stored in the return value based on the REST interface as the corresponding information.
  • the data consumer After the data consumer obtains the authorization information, it can use the authorization information to send a data extraction request to the data producer to extract the required data. Since the data consumer usually needs to interact with multiple data producers, the data consumer can obtain the authorization information of multiple data producers by using the above method.
  • Authorization information is stored in a local database.
  • configuration service 2 can also be configured as :
  • Configuration service 2 obtains the fields contained in the source data table where the required data is located from the data producer, which can be implemented in the following ways:
  • the data extraction request carries authorization information and relevant information of the data source data table; receives a piece of sample data in the source data table returned based on the data extraction request; based on the sample
  • the request parameter carries the authorization information and the data extraction request
  • the return value carries the data returned by the data producer based on the data extraction request.
  • the data consumer after obtaining the authorization information in Table 2, the data consumer sends a data extraction request carrying the above authorization information to the data producer in Table 2 through the data extraction interface, and the data producer passes the The data extraction interface returns response information (which contains a piece of sample data).
  • response information which contains a piece of sample data.
  • Table 3 for the code table for extracting data based on the data extraction interface provided by the embodiment of the present disclosure.
  • the data consumer carries the authorization information and the relevant information of the source data table in the request parameters of the REST interface (the range of data to be extracted from the source data table of the data producer is 1 page at a time , a total of 20 pages of data are obtained), the information in the above request parameters is also the information in the data extraction request, after the data producer receives the above data extraction request, it returns the corresponding response information, and the content of the response information is placed in the REST interface
  • the return value includes not only the data returned based on the request parameters ("name”: “apple”, “color”: “red"), but also the status of the response message ("ok", "200” , use 200 to indicate a successful response), and the relevant information of the source data table where the returned data is located ("pageindex”: 2, "totalpage”: 50, “totalsize”: 10000), so that the data producer can tell The source data table of the data consumer is currently returning the data on the second page, with a total of 50
  • the currently extracted piece of data returned is "name”: “apple”, “color”: “red”( This piece of data is the sample data. If multiple pieces of data are returned, any one of them will be used as the sample data. According to the extracted sample data, it can be determined that the source data table includes the two fields of name and color. According to apple and red, it can be determined The field types of the two fields name and color are character. Through the data returned in the above return value and the relevant information of the source data table where the data is located, the fields, field types, data volume, etc. included in the source data table can be determined, so as to facilitate the accurate definition of the data source of the extraction task.
  • the configuration service is configured as follows:
  • the target data model includes the way to write the data extracted from the data producer into the target data table, and The data format adopted by the field type of the field in the target data table; establish the mapping relationship between the fields in the source data table and the fields in the target data table and the data format conversion method of the mapped field; configure the execution cycle of the extraction task and the data synchronization method, Establish an extraction task based on the target data model and mapping relationship and the corresponding data format conversion method.
  • mapping relationship is completed by using a graphical interface and based on user operations.
  • User operations can be voice commands, touch commands, motion recognition, and the like.
  • FIG. 4 is a first configuration schematic diagram of a graphical interface defined by a data source provided by an embodiment of the present disclosure.
  • the parameters that need to be configured include: select the data source type (REST is selected), template name (for REST01), template description (for REST01 description), request protocol (support http and https , the choice is http), service address (for 10.10.85.33:7000), Pageindex (for 1), Pagesize (for 10), request method (for POST), login API (for isys/login), username (for Username), password, authentication method (JWT is selected), the user only needs to input relevant parameters in the above graphical interface.
  • REST data source type
  • template name for REST01
  • template description for REST01 description
  • request protocol support http and https , the choice is http
  • service address for 10.10.85.33:7000
  • Pageindex for 1
  • Pagesize for 10
  • request method for POST
  • login API for isys/login
  • username for Username
  • password password
  • JWT authentication method
  • the parameters (request parameters) that need to be requested in the authentication interface include: user name test01, password, and the parameters (request parameters) that need to be requested in the data extraction interface include: Extraction source data table 1 to 10 pages of data.
  • FIG. 5 is a second schematic diagram of the configuration of the graphical interface defined by the data source provided by the embodiment of the present disclosure.
  • the source definition of the data producer can be input by the user through the graphical interface shown in Figure 4 and Figure 5, and the source definition of the source data table can be completed according to the above information input by the user.
  • FIG. 6 is a schematic diagram of parameter configuration of the graphical interface of the target data model provided by the embodiment of the present disclosure.
  • the parameters included are: select the data source type (selected as POSTGRESOL), template name (as pg_test), template description (as the description of pg_test), POSTGRESOL connection parameters (set the parameter named as sslmode, parameter value is disabie), POSTGRESOL address (set to 10.10.85.33:5432), authentication method (select default).
  • POSTGRESOL select the data source type
  • template name as pg_test
  • template description as the description of pg_test
  • POSTGRESOL connection parameters set the parameter named as sslmode, parameter value is disabie
  • POSTGRESOL address set to 10.10.85.33:5432
  • authentication method select default.
  • Fig. 7 a schematic diagram of parameter configuration of the graphical interface of the target data table provided by the embodiment of the present disclosure.
  • the parameters included are: data source type (choose as postgresql), data source (choose as datasource-pg-dest), database name (that is, the database name of the target database is set to test) , table name (set to tb_staff), writing method (set to insert), batch size, etc.
  • data source type Choose as postgresql
  • data source Choose as datasource-pg-dest
  • database name that is, the database name of the target database is set to test
  • table name set to tb_staff
  • writing method set to insert
  • batch size etc.
  • FIG. 8 Please refer to FIG. 8 for a graphical interface for configuring a mapping relationship between a source data table and a target data table provided by an embodiment of the present disclosure.
  • the graphical interface shown in Figure 8 it mainly includes four parts: setting the fields contained in the source data table (referred to as source fields) and field types (abbreviated as type), and setting the corresponding source data table contains The field (called the target field) and its field type (referred to as the type) are set, and the corresponding source field and the target field should use which verification function (what parameters the verification function uses) to verify, and what kind of The conversion function (parameters used by the conversion function) performs data conversion.
  • source fields referred to as source fields
  • field types abbreviated as type
  • the conversion function what parameters used by the conversion function performs data conversion.
  • the source field including field 1 to field 3
  • its corresponding type type 1 to type 3
  • the target field including field a to field c
  • its corresponding type type a to type c
  • the verification function conversion function and their parameter settings
  • Users can directly configure the mapping relationship between each source field and each target field through the above-mentioned graphical interface (the same line is a mapping relationship), and their respective field types (ie types). If a source field with a mapping relationship and If the field types of the corresponding target fields are different, a conversion function can be set, and a verification function can also be set for verification.
  • the above verification function can be a verification rule, such as verifying whether the meaning of the corresponding data before and after conversion using the conversion function is the same, and using the verification function to verify the conversion result can prevent errors in the conversion function during the conversion process, resulting in Errors occur in the data corresponding to the target field, so that the correct rate of data conversion can be improved.
  • the conversion function is to convert the field type corresponding to the source field to the field type corresponding to the target field, so that the data extracted from the data producer can be automatically heterogeneous into the data of the data consumer, thereby improving the efficiency of data heterogeneity.
  • FIG. 9 Please refer to FIG. 9 for a graphical interface for configuring an extraction task provided by an embodiment of the present disclosure.
  • the parameters included are: task name, reminder, timing (for example, choose to execute the data extraction task every 0.5 minutes), task description, execution user (for example, set it as admin), and execution node (for example, set to 10.10.85.33.9501), timeout period (for example, set to 43200s), synchronization method (you can choose full amount or incremental amount, and the full amount is selected in Figure 9).
  • the user can configure the execution cycle of a certain extraction task through the graphical interface shown in FIG. 9 .
  • the configuration service 2 After completing the configurations in FIGS. 4 to 9 , the configuration service 2 creates an extraction task and provides the extraction task to the execution service 3 .
  • Execution Service 3 is configured as:
  • Generate the corresponding data extraction request according to the extraction task send the data extraction request to the data producer through the data extraction interface, and receive the data in the corresponding returned source data table; convert the returned data into the data in the target data table through the extraction task to store.
  • FIG. 10 is a schematic diagram of a relationship between a configuration service and an execution service provided by an embodiment of the present disclosure.
  • the configuration service can include the main components of data source definition, target data model definition, data field mapping, and extraction task configuration by function.
  • the configuration methods of the above parts have been introduced earlier. This will not be repeated here.
  • Each time the configuration service creates an extraction task the corresponding extraction task is sent to the execution service.
  • Execution service 3 may include several main components according to functions: executor, task scheduling, data mapping conversion, and target data storage.
  • the extraction task established by the configuration service 2 includes the corresponding executor in the execution service 3, and the specific control of which executor of the extraction task runs is determined by task scheduling.
  • the data producer of the data is manufacturer A
  • the task scheduling control schedules the operation of the executor of the extraction task corresponding to manufacturer A, generates the corresponding data extraction request, and after authorization through the authentication interface, the data collection request is sent through the data extraction interface
  • configuration service and execution service are divided into four functions shown in Figure 10, but in practical applications, they can also be divided into one, two, three, or even more functions, Therefore, it should not be understood that the configuration service and the execution service are limited to the four functions shown in FIG. 10 .
  • the current operation analysis platform needs the equipment data of smart security (manufacturer A) and the power data of smart energy (manufacturer B).
  • Table 4 shows the code diagram of data exchange between vendor A and vendor B based on the REST interface:
  • Vendor A and Vendor B can be defined as the source of metadata in Vendor C’s target database (for example, sourceDs1 and sourceDs2 respectively), and the definition and Vendor A and Vendor B Corresponding tables in the target database (for example, destDs1 and destDs2 respectively), map sourceDs1 to destDs1 and create extraction task 1, map sourceDs2 to destDs2 and create extraction task 2, and execute extraction task 1 and extraction task 2 respectively
  • the corresponding executor can automatically extract data from manufacturers A and B, map it to the target data table, and then store the target data table in the target location.
  • a configuration service is configured to extract data from a corresponding data producer based on the REST interface for correlation Configuration; a corresponding execution service is configured to execute the extraction task, and obtain the extracted data through the REST interface, and then map the extracted data to the target location.
  • the data consumer can serve as the client to send an http request to the data producer as the server, so that the data producer can return the data consumption according to the REST interface defined by the data consumer
  • the data to be extracted by the data consumer, and the data consumer maps the extracted data to the target data table, so that the data consumer can obtain the required data without intruding into the database of the data producer, thereby improving the security of data extraction , and there is no need to deploy interface programs on the data consumer and data producers, and to develop interface programs separately, thereby reducing the investment in manpower and time costs, thereby reducing the development costs of enterprises.
  • the data consumer configures the relevant parameters through the graphical interface before data extraction, the user's ease of operation is improved, and the operation is simple and convenient.
  • the general data extraction system is developed based on the microservice architecture, and the configuration service and execution service are all registered in the service registration center, so that configuration services for different data producers.
  • the rapid launch of the execution service improves the autonomy and independence of each service, allowing services for new data producers to be released and launched quickly without worrying about a wide range of impacts and spillovers on other system functions.
  • the above-mentioned services can exist in the form of components, so that they can be reused and reorganized, and new applications for data extraction can be quickly formed and released.
  • some services in the data extraction application can be expanded in a targeted manner to solve performance bottlenecks. Components corresponding to a service in a microservice can be replaced or restored independently.
  • the system using the microservice architecture system has unparalleled advantages in development efficiency, stability, and scalability, ensuring high availability and high concurrency of services.
  • the data extraction application running on the system can be launched quickly, which means that the speed and efficiency are improved, and independent expansion and recovery can be achieved, which means that the system is more secure, stable, and scalable.
  • each of the above services can be independently arranged in a container, and the container is also a cross-platform, independent operation, and is a small execution unit. Therefore, in the embodiments of the present disclosure, the deployment of each service is deployed in a containerized manner, and the services contained in the entire microservice architecture and their dependent environments can be packaged into a container image for deployment. In this way, the container only needs to encapsulate the service and the dependent files required by the service, so as to realize a lightweight operating environment and have a higher utilization rate of hardware resources than a virtual machine. Furthermore, different services in applications that depend on the above services can be isolated from each other, realizing one-click deployment of services, and greatly reducing the workload of operation and maintenance personnel.
  • embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present disclosure are described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种通用数据抽取的系统,应用于数据消费方,该系统基于微服务架构,包括服务注册中心,该系统包括:至少一个配置服务,一个配置服务被配置为从对应的一个数据生产方抽取数据的抽取任务进行相关配置;至少一个执行服务,一个执行服务被配置为执行抽取任务,并将抽取的数据映射到目标位置;其中,配置服务和执行服务均注册到微服务架构,且在与数据生产方通信时均使用REST接口传输数据。

Description

一种通用数据抽取的系统 技术领域
本公开涉及数据抽取领域,尤其是涉及一种通用数据抽取的系统。
背景技术
随着企业信息化的发展,信息系统越来越多的使用于日常工作中,不同业务部门、产线、企业等会形成各自的业务系统。
作为业务系统的核心资产——数据库,已成为企业的核心资产,通常不会向第三方开放;即便是同一企业的业务部门由于历史原因,如各自采用的编程语言不通、保密要求不同也导致彼此间的业务数据不能互通。然而,但随着信息化建设的不断深入,一个大型项目通常需要集成不同企业或不同业务部门提供的业务数据(数据生产方),然而这些业务的流程不互通、数据不共享。这给开发大型项目的用户(数据消费方)对数据的分析利用、报表开发、分析挖掘等带来了巨大困难。
现有技术中,通常是使用数据仓库技术从不同的业务系统的数据库中抽取数据,然后进行加工集成后,再存入本地的数据库。但这种解决方案的缺点在于使用成熟的数据抽取工具需要不菲的价格,一般企业难以负担,而使用开源工具则需要侵入企业的数据库,通常这是不被企业允许的。且,使用上述解决方案,还需要数据消费方针对不同业务系统开发数据同步接口,并由数据生产方调用接口来实现数据抽取及数据上报,这导致数据消费方需要频繁开发接口,同时数据生产方集中式上报会造成数据消费方较大的数据压力。
发明内容
本公开提供一种通用数据抽取的系统,用以解决现有技术中存在的上述技术问题。
第一方面,为解决上述技术问题,本公开实施例提供的一种通用数据抽取的系统,应用于数据消费方,所述系统基于微服务架构,包括服务注册中心,该方法的技术方案如下:
至少一个配置服务,一个所述配置服务被配置为从对应的一个数据生产方抽取数据的抽取任务进行相关配置;
至少一个执行服务,一个所述执行服务被配置为执行所述抽取任务,并将抽取的数据映射到目标位置;其中,所述配置服务和所述执行服务均注册到所述微服务架构,且在与所述数据生产方通信时均使用所述REST接口传输数据。
一种可能的实施方式,所述REST接口,包括:
基于所述REST的认证接口和基于所述REST数据抽取接口;其中,所述认证接口被配置为获取访问所述数据生产方的授权信息,所述数据抽取接口被配置为利用所述授权信息从所述数据生产方抽取数据。
一种可能的实施方式,所述配置服务还被配置为:
通过所述认证接口向所述数据生产方发送授权请求信息;其中,所述授权请求中携带在所述数据消费方进行注册所需的用户名和用户密码;
接收所述数据生产方基于所述授权请求信息返回的所述授权信息;其中,所述授权信息是基于所述用户名和所述用户密码生成的。
一种可能的实施方式,所述配置服务还被配置为:
从所述数据生产方获取所需数据所在的源数据表包含的字段;
将所述源数据表包含的字段映射到所述数据消费方的目标数据表的字段,并建立对应的抽取任务。
一种可能的实施方式,所述配置服务还被配置为:
通过所述数据抽取接口向所述数据生产方发送数据抽取请求;其中,所述数据抽取请求中携带所述授权信息,以及所述数源数据表的相关信息;
接收基于所述数据抽取请求返回的所述源数据表中的一条样本数据;
基于所述样本数据获取所述源数据表包含的字段及对应的字段类型。
一种可能的实施方式,所述配置任务还被配置为:
基于所述数据生产方及所述源数据表对所述抽取任务的数据来源进行定义,并定义所述目标数据表对应的目标数据模型;其中,所述目标数据模型包括从所述数据生产方抽取的数据写入所述目标数据表的方式,及所述目标数据表中字段的字段类型采用的数据格式;
建立所述源数据表中字段与所述目标数据表中字段的映射关系及映射字段的数据格式转换方式;
对所述抽取任务的执行周期及数据同步方式进行配置,建立基于所述目标数据模型和所述映射关系及对应数据格式转换方式的所述抽取任务。
一种可能的实施方式,所述执行服务被配置为:
根据所述抽取任务生成对应的数据抽取请求;
通过所述数据抽取接口向所述数据生产方发送所述数据抽取请求,并接收对应返回的所述源数据表中的数据;
通过所述抽取任务将返回的数据转换为所述目标数据表中的数据进行存储。
一种可能的实施方式,所述REST接口,包括:
用户列表,用于存放所述数据生产方的网址;
接口,用于表征对所述数据生产方执行的操作;
请求参数,用于表征向所述数据生产方请求的参数;
返回值,用于表征所述数据生产方基于所述请求参数返回的数据。
一种可能的实施方式,所述请求参数和所述返回值使用的数据格式为JSON格式。
一种可能的实施方式,当所述REST接口为所述认证接口时,所述请求参数中携带有所述授权请求,所述返回值中携带所述授权信息。
一种可能的实施方式,当所述REST接口为所述数据抽取接口时,所述请求参数中携带所述授权信息以及所述数据抽取请求,所述返回值中携带所述数据生产方基于所述数据抽取请求返回的数据。
一种可能的实施方式,所述数据来源的定义、所述目标数据模型的配置、所述映射关系的建立是采用图形化的界面,基于用户操作完成的。
一种可能的实施方式,所述配置服务和所述执行服务均使用容器技术,且每个所述配置服务和每个所述执行服务均运行在各自对应的一个容器中。
附图说明
图1为本公开实施例提供的一种通用数据抽取系统的结构示意图;
图2为本公开实施例提供的数据消费方与数据生产方的关系图;
图3为本公开实施例提供的数据消费方的运营平台使用的系统结构示意图;
图4为本公开实施例提供的数据来源定义的图形化界面的配置示意图一;
图5为本公开实施例提供的数据来源的图形化界面配置示意图二;
图6为本公开实施例提供的目标数据模型的图形化界面的参数配置示意图;
图7为本公开实施例提供的目标数据表的图形化界面的参数配置示意图;
图8为本公开实施例提供的配置源数据表与目标数据表的映射关系的图形化界面;
图9为本公开实施例提供的配置抽取任务的图形化界面;
图10为本公开实施例提供的配置服务与执行服务的关系示意图。
具体实施方式
本公开实施列提供一种通用数据抽取的系统,以解决现有技术中存在的上述技术问题。
为了更好的理解上述技术方案,下面通过附图以及具体实施例对本公开技术方案做详细的说明,应当理解本公开实施例以及实施例中的具体特征是对本公开技术方案的详细的说明,而不是对本公开技术方案的限定,在不冲突的情况下,本公开实施例以及实施例中的技术特征可以相互组合。
请参考图1,本公开实施例提供一种通用数据抽取的系统,应用于数据消费方,该系统基于微服务架构,包括服务注册中心1,该系统包括:
至少一个配置服务2,一个配置服务2被配置为从对应的一个数据生产方抽取数据的抽取任务进行相关配置;其中,相关配置包括基于REST的数据结构;
至少一个执行服务3,一个执行服务3被配置为执行抽取任务,并将抽取的数据映射到目标位置;其中,配置服务和执行服务均注册到微服务架构,且在与数据生产方通信时均使用REST接口传输数据。
数据生产方可以是外部企业、机构、业务部门的数据库,数据消费方可以是需要将上述外部企业、机构、业务部门的数据库中的部分数据进行融合的一方。请参见图2为本公开实施例提供的数据消费方与数据生产方的关系图。
在图2中厂商C作为数据消费方,需要建立一个运营平台,该运营平台对应的数据库的数据,需要从作为数据生产方的厂商A的智能安防的数据库和厂商B的智能能源的数据库中抽取。
请参见图3为本公开实施例提供的数据消费方的运营平台使用的系统结构示意图。该系统是基于微服务架构的,此系统包括注册中心集群,该中心注册集群可以是由多个服务注册中心1组成的,服务注册中心1是由对系统中的服务提供注册服务的服务器构成的,不同服务注册中心1提供注册服务的种类可以不同。
在微服务架构中,通常是用Spring Eureka(一种具有服务注册和发现的能力的工具)作为注册中心1,通过Spring Eureka可以查询到系统中所包含的服务(包括配置佩服、执行服务等),服务的注册情况、可用性、状态等,从而对已注册的服务进行管控。
该系统还包括业务服务集群,根据从注册中心集群获取的服务列表可以确定该业务服务集群包括配置单元服务集群和执行单元服务集,配置单元服务集群包括配置服务A、配置服务B、其它配置服务(对应其它数据消费方的 其它业务系统),配置服务A从厂商A(数据生产方)抽取所需数据的抽取任务(记为抽取任务A)进行相关配置,配置服务B从厂商B(数据生产方)抽取所需数据的抽取任务(记为抽取任务B)进行相关配置。执行服务单元集群中包括与配置服务A对应的执行服务A、与配置服务B对应的执行服务B、与其它配置服务对应的其它执行服务;执行服务A被配置为执行由配置服务A生成的抽取任务A,并将从厂商A抽取的数据映射到厂商C(数据消费方)的数据库和/或高速缓存中(目标位置),执行服务B被配置为执行由配置服务B生成的抽取任务B,并将从厂商B抽取的数据映射到厂商C(数据消费方)的数据库和/或高速缓存中(目标位置)。
上述配置服务中需要的相关参数等可以由用户在用户端进行输入,并通过网络传输给系统,经由系统中的负载均衡及网关服务传输给对应的配置服务。
运营平台在抽取智能安防中的数据时,需要经由网络向智能安防发送请求信息,使智能安防将抽取的数据并放置在响应信息中返回给运营平台。
为了实现上述通信,数据消费方定义了基于表述性状态转移(Representational State Transfer,REST)的REST接口,并让数据生产方也通过该REST接口与数据消费方交互数据。
REST可以用来规范客户端如何在超文本传输协议(HypertextTransferProtocol,HTTP)层与服务端的应用程序接口(ApplicationProgrammingInterface,API)进行数据交互。REST描述了HTTP层里客户端和服务端的数据交互规则;客户端通过向服务端发送HTTP(s)请求,接收服务端的响应,完成一次HTTP交互。这个交互过程中,REST约定了两个重要方面就是HTTP请求所采用的方法,以及请求的链接。
在本公开中将数据消费方作为客户端,将数据生产方作为服务端,基于REST的网络接口将用户列表对应的网址(也就是数据生产方的网址)作为请求的链接,将接口对应的操作作为在HTTP请求所采用的方法,而数据消费方所需的数据或向数据生产方提供的信息则在请求参数中提供,数据生产方 基于请求参数返回的数据在存放在返回值中。
具体的,基于REST接口包括以下几个组成部:
用户列表,用于存放数据生产方的网址;
接口,用于表征对数据生产方执行的操作;
请求参数,用于表征向数据生产方请求的参数;
返回值,用于表征数据生产方基于请求参数返回的数据。
其中,请求参数和返回值使用的数据格式为JSON格式。
请参见表1为本公开实施提供的基于REST的网络接口的定义。
表1
名称 内容
用户列表(URL)  
接口(Method)  
请求参数(参数)  
返回值  
REST接口,包括基于REST的认证接口和基于REST数据抽取接口;其中,认证接口被配置为获取访问数据生产方的授权信息,数据抽取接口被配置为利用授权信息从数据生产方抽取数据。
需要说明的是,认证接口和数据抽取接口都是REST接口,具有与REST接口相同的组成部分,不同之处在于认证接口和数据抽取接口实现的功能不同。
为了合法的从数据生产方抽取数据,数据消费方需要先在数据生产方进行注册,此时配置服务2还被配置为:
通过认证接口向数据生产方发送授权请求信息;其中,授权请求中携带在数据消费方进行注册所需的用户名和用户密码;接收数据生产方基于授权请求信息返回的授权信息;其中,授权信息是基于用户名和用户密码生成的。
当REST接口为认证接口时,请求参数中携带有授权请求,返回值中携带授权信息。
请参见表2为本公开实施例提供的基于REST接口进行注册的代码表。
表2
Figure PCTCN2021098638-appb-000001
数据生产方通过上述表2中示出的内容,可以增加(POST代表增加)一个从数据生产方(对应的网址为boe.com.cn/va/x1),授权请求信息中携带了注册的用户名为“user”使用的密码为“123456”,将该信息作为基于REST接口的请求参数,数据生产方根据请求参数中提供的上述信息生成对应的授权信息“0a32d8de-4789-49a9-afd7-c5544894fdf5”,将该授权信息作为相应信息,存储在基于REST接口的返回值中,数据消费方获取到此授权信息后,便可利用授权信息向数据生产方发送抽取所需数据的数据抽取请求。由于数据消费方通常需要与多个数据生产方进行数据交互,因此数据消费方采用上述方式可以获得多个数据生产方的授权信息,为便于管理,数据消费方可以将从各数据生产方获得的授权信息存储在本地的数据库中。
在获得数据生产方的授权信息后,数据消费方还需获得数据生产方的源数据表的字段及字段类型等,便于建立对应的抽取任务,为实现此功能,配置服务2还可以被配置为:
从数据生产方获取所需数据所在的源数据表包含的字段;将源数据表包含的字段映射到数据消费方的目标数据表的字段,并建立对应的抽取任务。
配置服务2从数据生产方获取所需数据所在的源数据表包含的字段,可以通过下列方式实现:
通过数据抽取接口向数据生产方发送数据抽取请求;其中,数据抽取请求中携带授权信息,以及数源数据表的相关信息;接收基于数据抽取请求返回的源数据表中的一条样本数据;基于样本数据获取源数据表包含的字段及对应的字段类型。
当REST接口为数据抽取接口时,请求参数中携带授权信息以及数据抽取请求,返回值中携带数据生产方基于数据抽取请求返回的数据。
以表2为例,数据消费方在获得表2中的授权信息后,向表2中的数据生产方通过数据抽取接口发送携带上述授权信息的数据抽取请求,数据生产方根据该数据抽取请求通过数据抽取接口返回响应信息(里面包含一条样本数据)。请参见表3为本公开实施例提供的基于数据抽取接口抽取数据的代码表。
表3
Figure PCTCN2021098638-appb-000002
在表3中,基于数据抽取接口,数据消费方在REST接口的请求参数中携带授权信息以及源数据表的相关信息(需从数据生产方的源数据表中抽取数据的范围为每次1页,共获取20页的数据),上述请求参数中的信息也就 是数据抽取请求中的信息,数据生产方接收到上述数据抽取请求后,返回对应的响应信息,该响应信息的内容放置在REST接口的返回值中,该返回值中不仅包括了基于请求参数返回的数据(“name”:“apple”,“color”:“red”),还包括响应消息的状态(“ok”、“200”,用200表示响应成功),以及返回的数据所在源数据表的相关信息(“pageindex”:2、“totalpage”:50、“totalsize”:10000),这样数据生产方通过返回值中的内容告诉数据消费方源数据表中当前返回的是第2页的数据、总共有50页、10000条数据,当前返回抽取的一条抽取数据为“name”:“apple”,“color”:“red”(这条数据即为样本数据,若返回的是多条数据将其中任一条作为样本数据),根据抽取的样本数据可以确定源数据表中包括name和color这2个字段,根据apple、red可以确定name和color这2个字段的字段类型为字符型。通过上述返回值中返回的数据及数据所在源数据表的相关信息,可以确定源数据表包括的字段及字段类型、数据量等,从而便于准确的对抽取任务的数据来源进行定义。
在获取源数据表中包含的字段和字段类型后,便可将源数据表包含的字段映射到数据消费方的目标数据表的字段,具体的此时配置服务被配置为:
基于数据生产方及源数据表对抽取任务的数据来源进行定义,并定义目标数据表对应的目标数据模型;其中,目标数据模型包括从数据生产方抽取的数据写入目标数据表的方式,及目标数据表中字段的字段类型采用的数据格式;建立源数据表中字段与目标数据表中字段的映射关系及映射字段的数据格式转换方式;对抽取任务的执行周期及数据同步方式进行配置,建立基于目标数据模型和映射关系及对应数据格式转换方式的抽取任务。
其中,数据来源的定义、目标数据模型的配置、映射关系的建立是采用图形化的界面,基于用户操作完成的。用户操作可以是语音指令、触摸指令、动作识别等。
请参见图4为本公开实施例提供的数据来源定义的图形化界面的配置示意图一。
在图4所示的图形化界面中,需要配置的参数包括:选择数据源类型(选 择的是REST)、模板名称(为REST01)、模板描述(为REST01描述)、请求协议(支持http和https,选择的是http)、服务地址(为10.10.85.33:7000)、Pageindex(为1)、Pagesize(为10)、请求方式(为POST)、登录API(为isys/login)、用户名(为用户名)、密码、认证方式(选择的是JWT),用户只需在上述图形化界面中输入相关参数。
根据图4中配置的内容,可以确定在认证接口中需要请求的参数(请求参数)包括:用户名test01、密码,在数据抽取接口中需要请求的参数(请求参数)包括:抽取源数据表中1~10页的数据。
请参见图5为本公开实施例提供的数据来源定义的图形化界面配置示意图二。
在图5所示的图形化界面中,需要配置的参数包括:数据源类型(设置为REST)、数据源(设置为data-resource-ap)、请求API(设置为/peds:nan/staffs)、请求方式(选择的GET)、请求参数(可以任一个参数的参数名、参数值进行设置,如参数名为Pageindex的参数值为1,参数名为Pagesize的参数值为20,在该项中可以利用提供的加、减号删除已建好的参数,也可以用加号增加一个新参数)、参数预览(根据设置的参数生成对应的预览数据Pageindex=1&Pagesize=20)。
数据生产方的来源定义即可通过图4和图5的图形化界面,由用户进行输入,根据用户输入的上述信息完成对源数据表的来源定义。
请参见图6为本公开实施例提供的目标数据模型的图形化界面的参数配置示意图。
在图6所示的图形化界面中,包括的参数有:选择数据源类型(选为POSTGRESOL)、模板名称(为pg_test)、模板描述(为pg_test的描述)、POSTGRESOL连接参数(设置参数名为sslmode、参数值为disabie)、POSTGRESOL地址(设为10.10.85.33:5432)、认证方式(选择默认)。用户通过在上述图形界面中设置相关参数,可以完成对目标数据模型的定义。
请参见图7为本公开实施例提供的目标数据表的图形化界面的参数配置 示意图。
在图7所示的图形化界面中,包括的参数有:数据源类型(选择为postgresql)、数据源(选择为datasource-pg-dest)、库名(即目标数据库的库名设为test)、表名(设置为tb_staff)、写入方式(设为insert)、批量大小等。用户通过在上述图形化界面中进行参数配置,可以定义好抽取的数据需要写入到目标数据库中的哪个表中,是以插入的方式写入还是其它方式写入(如覆盖的方式),写入的批量大小是多少等。使用插入方式写入时,可以仅将新增的数据写入,这样可以减少数据传输量;以覆盖的方式写入时,可以对发生变化的数据进行更新。
请参见图8为本公开实施例提供的配置源数据表与目标数据表的映射关系的图形化界面。
在图8所示的图形化界面中,主要包括四部分:对源数据表中包含的字段(称为源字段)及字段类型(简称类型)进行设置、对与之对应的源数据表中包含的字段(称为目标字段)及其字段类型(简称类型)进行设置,以及相应源字段与目标字段应采用何种校验函数(该校验函数使用何种参数)进行校验、采用何种转换函数(该转换函数使用的参数)进行数据转换。图8所示,源字段(包括字段1~字段3)、其对应的类型(类型1~类型3)、目标字段(包括字段a~字段c)、其对应的类型(类型a~类型c),图8中未具体示出校验函数、转换函数及它们的参数设置。用户通过上述图形化界面可以直接配置好各源字段与各目标字段的映射关系(同一行的为一个映射关系),及它们各自的字段类型(即类型),若具有映射关系的一个源字段与对应的目标字段的字段类型不同可以设置转换函数,并且还可以设置校验函数进行校验。
上述校验函数可以是校验规则,如校验使用转换函数转换前后对应的数据含义是否相同,使用校验函数对转换结果进行校验,可以防止转换函数在转换的过程中出现错误,而导致目标字段对应的数据发生错误,从而能够提高数据转换的正确率。而转换函数是将源字段对应的字段类型转换为目标字段对应的字段类型,这样可以自动将从数据生产方抽取的数据异构为数据消 费方的数据,从而提高数据异构的效率。
请参见图9为本公开实施例提供的配置抽取任务的图形化界面。
在图9所示的图形化界面中,包括的参数有:任务名称、提醒人、定时(如选择每05分钟执行一次数据抽取任务)、任务描述、执行用户(如设置为admin)、执行节点(如设置为10.10.85.33.9501)、超时时间(如设置为43200s)、同步方式(可以选择全量或增量,图9中选择的是全量)。用户通过图9所示的图形化界面可以对某一抽取任务的执行周期等进行配置。
在完成图4~图9的配置后,配置服务2便建立了一个抽取任务,并将该抽取任务提供给执行服务3。
执行服务3被配置为:
根据抽取任务生成对应的数据抽取请求;通过数据抽取接口向数据生产方发送数据抽取请求,并接收对应返回的源数据表中的数据;通过抽取任务将返回的数据转换为目标数据表中的数据进行存储。
请参见图10为本公开实施例提供的配置服务与执行服务的关系示意图。
继续以图2为例,配置服务按功能可以包括数据来源定义、目标数据模型定义、数据字段映射、抽取任务配置这几个主要的组成部分,上述各部分的配置方式在前面已经介绍过,在此不再赘述。配置服务每建立一个抽取任务,相应的该抽取任务便会送到执行服务。
执行服务3按功能可以包括:执行器、任务调度、数据映射转换、目标数据保存这几个主要的组成部分。配置服务2建立的抽取任务,在执行服务3中包括对应的执行器,而具体控制哪个抽取任务的执行器运行,则是由任务调度决定的,在执行服务从数据生产方(假设当前被抽取数据的数据生产方为厂商A),任务调度控制调度厂商A对应的抽取任务的执行器运行,生成对应的数据抽取请求,并在通过认证接口进行授权后,再通过数据抽取接口将数据收取请求发送给厂商A的智能安防(源数据表),以实现从厂商A的智能安防中抽取数据,按照配置服务中配置的源数据表与目标数据表的映射关系及字段类型进行数据映射和数据格式的转换,得到目标数据表,并进行保 持(即目标数据保存)。
需要说明的是,在图10中将配置服务和执行服务分成了图10所示的4种功能,然而在实际应用中,也可以分为1种、2种、3种、甚至更多功能,因此不应理解为配置服务和执行服务被限定为分成图10所示的4种功能。
继续以图2为例,现运营分析平台(厂商C)需要智能安防(厂商A)的设备数据,需要智能能源(厂商B)的电量数据等信息,作为数据消费方的厂商C与作为数据生产方的厂商A和厂商B在基于REST接口交互数据的代码示意如表4所示:
表4
Figure PCTCN2021098638-appb-000003
Figure PCTCN2021098638-appb-000004
根据上述表4中REST接口中的信息,可以将厂商A和厂商B作为厂商C的目标数据库中的元数据的来源进行定义(如分别定义为sourceDs1和sourceDs2),以及定义与厂商A和厂商B分别对应的目标数据库中的表(如分别定义为destDs1和destDs2),将sourceDs1映射到destDs1并创建抽取任务1,将sourceDs2映射到destDs2并创建抽取任务2,通过执行抽取任务1和抽取任务2各自对应的执行器,便可自动完成从厂商A和B抽取数据,并映射到目标数据表中,进而将目标数据表存储到目标位置。
在本公开提供的实施例中,通过让数据消费方的系统包括至少一个配置服务和至少一个执行服务,一个配置服务被配置为从对应的一个数据生产方基于REST接口抽取数据的抽取任务进行相关配置;相应的一个执行服务被配置为执行抽取任务,并通过REST接口获得抽取的数据,进而将抽取的数据映射到目标位置。由于采用的是数据消费方定义的REST接口,使得数据消费方可以作为客户端向作为服务端的数据生产方通过发送http请求的方式,让数据生产方按照数据消费方定义的REST接口,返回数据消费方所需抽取的数据,并由数据消费方将抽取的数据映射到目标数据表中,这样数据消费方无需侵入数据生产方的数据库便能获取所需抽取数据,从而提高了数据抽取的安全性,且无需在数据消费方和数据生产方部署接口程序、单独进行接口程序的开发,从而降低了人力、时间成本的投入,进而降低了企业的开发成本。并且由于数据消费方在进行数据抽取前是通过图形化界面对相关参数进行配置的,提高了用户的易操作性,操作简单方便。
请参见表5为本公开实施例提供的本公开基于微服务架构与其它系统的 优劣对照表。
表5
Figure PCTCN2021098638-appb-000005
Figure PCTCN2021098638-appb-000006
在本公开提供的实施例中,由于通用数据抽取的系统是基于微服务架构开发的,并将配置服务、执行服务等都注册到服务注册中心,这样可以让针对不同数据生产方的配置服务、执行服务快速上线,提高了各服务的自治性和独立性,让针对新的数据生产方的服务能够迅速的发布上线,而不用担心对系统其它功能带来大范围的影响和波及。并且上述各服务可以以组件的方式存在,这样能够重用重组,快速的形成和发布数据抽取的新应用。
当有更多的用户访问此系统进行数据抽取时,可以针对性的对数据抽取应用中的某些服务进行扩容,解决性能的瓶颈。可以独立替换或恢复微服务中的某个服务对应的组件。
进一步的,采用微服务架构体系的系统在开发效率、稳定性、可扩展性上具备了无可比拟的优势,保障了服务的高可用和高并发。并且能让在该系统上运行的数据抽取的应用快速上线,这意味着速度和效率的提升,并且可以实现独立扩容和恢复,这意味着系统的安全、稳定、可扩展更高。
进一步的,由于微服务本身就是独立发布、独立部署、自治的、微小的服务,上述每个服务可以独立布置在一个容器中,而容器也是跨平台、独立运行、是一个小的执行单元。所以本公开实施例中,各服务的部署采用了容器化方式部署,可以让整个微服务架构中包含的服务及其依赖的环境打包为容器镜像的方式进行部署。这样容器仅需要封装服务和服务需要的依赖文件,从而实现轻量的运行环境,且拥有比虚拟机更高的硬件资源利用率。进而可以实行依赖上述服务的应用中不同服务相互隔离,实现了服务的一键部署, 大大减轻了运维人员实施的工作量。
本领域内的技术人员应明白,本公开实施例可提供为方法、系统、或计算机程序产品。因此,本公开实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开实施例是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。
这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (13)

  1. 一种通用数据抽取的系统,应用于数据消费方,所述系统基于微服务架构,包括服务注册中心,其中,包括:
    至少一个配置服务,一个所述配置服务被配置为从对应的一个数据生产方抽取数据的抽取任务进行相关配置;
    至少一个执行服务,一个所述执行服务被配置为执行所述抽取任务,并将抽取的数据映射到目标位置;其中,所述配置服务和所述执行服务均注册到所述微服务架构,且在与所述数据生产方通信时均使用REST接口传输数据。
  2. 如权利要求1所述的系统,其中,所述REST接口,包括:
    基于所述REST的认证接口和基于所述REST数据抽取接口;其中,所述认证接口被配置为获取访问所述数据生产方的授权信息,所述数据抽取接口被配置为利用所述授权信息从所述数据生产方抽取数据。
  3. 如权利要求2所述的系统,其中,所述配置服务还被配置为:
    通过所述认证接口向所述数据生产方发送授权请求信息;其中,所述授权请求中携带在所述数据消费方进行注册所需的用户名和用户密码;
    接收所述数据生产方基于所述授权请求信息返回的所述授权信息;其中,所述授权信息是基于所述用户名和所述用户密码生成的。
  4. 如权利要求3所述的系统,其中,所述配置服务还被配置为:
    从所述数据生产方获取所需数据所在的源数据表包含的字段;
    将所述源数据表包含的字段映射到所述数据消费方的目标数据表的字段,并建立对应的抽取任务。
  5. 如权利要求4所述的系统,其中,所述配置服务还被配置为:
    通过所述数据抽取接口向所述数据生产方发送数据抽取请求;其中,所述数据抽取请求中携带所述授权信息,以及所述数源数据表的相关信息;
    接收基于所述数据抽取请求返回的所述源数据表中的一条样本数据;
    基于所述样本数据获取所述源数据表包含的字段及对应的字段类型。
  6. 如权利要求5所述的系统,其中,所述配置任务还被配置为:
    基于所述数据生产方及所述源数据表对所述抽取任务的数据来源进行定义,并定义所述目标数据表对应的目标数据模型;其中,所述目标数据模型包括从所述数据生产方抽取的数据写入所述目标数据表的方式,及所述目标数据表中字段的字段类型采用的数据格式;
    建立所述源数据表中字段与所述目标数据表中字段的映射关系及映射字段的数据格式转换方式;
    对所述抽取任务的执行周期及数据同步方式进行配置,建立基于所述目标数据模型和所述映射关系及对应数据格式转换方式的所述抽取任务。
  7. 如权利要求1-6任一项所述的系统,其中,所述执行服务被配置为:
    根据所述抽取任务生成对应的数据抽取请求;
    通过所述数据抽取接口向所述数据生产方发送所述数据抽取请求,并接收对应返回的所述源数据表中的数据;
    通过所述抽取任务将返回的数据转换为所述目标数据表中的数据进行存储。
  8. 如权利要求1-6任一项所述的系统,其中,所述REST接口,包括:
    用户列表,用于存放所述数据生产方的网址;
    接口,用于表征对所述数据生产方执行的操作;
    请求参数,用于表征向所述数据生产方请求的参数;
    返回值,用于表征所述数据生产方基于所述请求参数返回的数据。
  9. 如权利要求8所述的系统,其中,所述请求参数和所述返回值使用的数据格式为JSON格式。
  10. 如权利要求9所述的系统,其中,当所述REST接口为所述认证接口时,所述请求参数中携带有所述授权请求,所述返回值中携带所述授权信息。
  11. 如权利要求9所述的系统,其中,当所述REST接口为所述数据抽 取接口时,所述请求参数中携带所述授权信息以及所述数据抽取请求,所述返回值中携带所述数据生产方基于所述数据抽取请求返回的数据。
  12. 如权利要求6所述的系统,其中,所述数据来源的定义、所述目标数据模型的配置、所述映射关系的建立是采用图形化的界面,基于用户操作完成的。
  13. 如权利要求1-6任一项所述的系统,其中,所述配置服务和所述执行服务均使用容器技术,且每个所述配置服务和每个所述执行服务均运行在各自对应的一个容器中。
PCT/CN2021/098638 2021-06-07 2021-06-07 一种通用数据抽取的系统 WO2022256969A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180001459.1A CN115836284A (zh) 2021-06-07 2021-06-07 一种通用数据抽取的系统
PCT/CN2021/098638 WO2022256969A1 (zh) 2021-06-07 2021-06-07 一种通用数据抽取的系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/098638 WO2022256969A1 (zh) 2021-06-07 2021-06-07 一种通用数据抽取的系统

Publications (1)

Publication Number Publication Date
WO2022256969A1 true WO2022256969A1 (zh) 2022-12-15

Family

ID=84424653

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098638 WO2022256969A1 (zh) 2021-06-07 2021-06-07 一种通用数据抽取的系统

Country Status (2)

Country Link
CN (1) CN115836284A (zh)
WO (1) WO2022256969A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368503A (zh) * 2016-05-13 2017-11-21 北京京东尚科信息技术有限公司 基于Kettle的数据同步方法和系统
CN107622055A (zh) * 2016-07-13 2018-01-23 航天科工智慧产业发展有限公司 一种快速实现数据服务发布的方法
US20180032706A1 (en) * 2016-08-01 2018-02-01 Palantir Technologies Inc. Secure deployment of a software package
CN109191008A (zh) * 2018-09-30 2019-01-11 江苏农牧科技职业学院 一种用于水产品质量安全监管系统的微服务框架系统
CN111680033A (zh) * 2020-04-30 2020-09-18 广州市城市规划勘测设计研究院 一种高性能gis平台
CN111752965A (zh) * 2020-05-29 2020-10-09 南京南瑞继保电气有限公司 一种基于微服务的实时数据库数据交互方法和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368503A (zh) * 2016-05-13 2017-11-21 北京京东尚科信息技术有限公司 基于Kettle的数据同步方法和系统
CN107622055A (zh) * 2016-07-13 2018-01-23 航天科工智慧产业发展有限公司 一种快速实现数据服务发布的方法
US20180032706A1 (en) * 2016-08-01 2018-02-01 Palantir Technologies Inc. Secure deployment of a software package
CN109191008A (zh) * 2018-09-30 2019-01-11 江苏农牧科技职业学院 一种用于水产品质量安全监管系统的微服务框架系统
CN111680033A (zh) * 2020-04-30 2020-09-18 广州市城市规划勘测设计研究院 一种高性能gis平台
CN111752965A (zh) * 2020-05-29 2020-10-09 南京南瑞继保电气有限公司 一种基于微服务的实时数据库数据交互方法和系统

Also Published As

Publication number Publication date
CN115836284A (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
CN111831269A (zh) 一种应用开发系统、运行方法、设备及存储介质
US10291704B2 (en) Networked solutions integration using a cloud business object broker
US8127237B2 (en) Active business client
CN101388904B (zh) Gis服务聚合方法、装置及系统
WO2018223214A1 (en) A permissioned blockchain development system based on open blockchain connector (obcc)
US10339164B2 (en) Data exchange in a collaborative environment
WO2010127552A1 (zh) 面向服务的应用系统及其通信方法、创建器和创建方法
US20190251096A1 (en) Synchronization of offline instances
CN109150964B (zh) 一种可迁移的数据管理方法及服务迁移方法
CN104090896B (zh) 一种导入数据的方法、装置和系统
Huang et al. A geospatial hybrid cloud platform based on multi-sourced computing and model resources for geosciences
Nan et al. Multimedia learning platform development and implementation based on cloud environment
CN107222575B (zh) 实现工控设备间opc通信的方法
CN109218378B (zh) 一种基于云平台的小型物流管理平台设计方法
CN113886055A (zh) 一种基于容器云技术的智能模型训练资源调度方法
Anjos et al. BIGhybrid: a simulator for MapReduce applications in hybrid distributed infrastructures validated with the Grid5000 experimental platform
Zhang et al. Future manufacturing industry with cloud manufacturing
US10015049B2 (en) Configuration of network devices in a network
WO2022256969A1 (zh) 一种通用数据抽取的系统
Zhang et al. Design of M2M Platform Based on J2EE and SOA
Shiau et al. A unified framework of the cloud computing service model
CN103399844A (zh) 报表的生成方法和生成装置
US20190005255A1 (en) Protecting restricted information when importing and exporting resources
Ramisetty et al. Ontology integration for advanced manufacturing collaboration in cloud platforms
CN113822557A (zh) 数据融合管理系统、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21944481

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18567395

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE