CN111831713A - Data processing method, device and equipment - Google Patents

Data processing method, device and equipment Download PDF

Info

Publication number
CN111831713A
CN111831713A CN201910312700.7A CN201910312700A CN111831713A CN 111831713 A CN111831713 A CN 111831713A CN 201910312700 A CN201910312700 A CN 201910312700A CN 111831713 A CN111831713 A CN 111831713A
Authority
CN
China
Prior art keywords
data
processing unit
target
conversion information
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910312700.7A
Other languages
Chinese (zh)
Inventor
周祥
王烨
李鸣翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910312700.7A priority Critical patent/CN111831713A/en
Priority to PCT/CN2020/084423 priority patent/WO2020211717A1/en
Publication of CN111831713A publication Critical patent/CN111831713A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data processing method, a device and equipment, wherein the method comprises the following steps: acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format; acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format; acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using first conversion information; second data is acquired from the target processing unit and output. By the technical scheme, the computing resources of the data lake analysis system can be saved, and the processing performance is improved.

Description

Data processing method, device and equipment
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, and device.
Background
Data Lake analysis (Data Lake analysis) is used for providing a Serverless query analysis service for users, can analyze and query mass Data in any dimension, and can support functions of high concurrency, low delay (millisecond response), real-time online analysis, mass Data query and the like.
In a data lake analysis system, a storage cluster and a computation cluster are included, wherein the storage cluster comprises different types of data sources, and the data sources adopt different data formats. A compute cluster includes multiple compute nodes, and different compute nodes may employ different data formats. In general, the data source uses a different data format than the compute node, and therefore, the data format needs to be converted.
For example, data in data format A1 is read from a data source, data in data format A1 is converted to data in data format B1, data in data format B1 is output to compute nodes, and the compute nodes process the data in data format B1. Because different types of data sources adopt different data formats and different computing nodes also adopt different data formats, the data lake analysis system needs to support conversion of various data formats, the data lake analysis system needs to provide a large amount of computing resources, the data format conversion is realized by the computing resources, and the demand on the computing resources is increased along with the increase of the number of users.
Disclosure of Invention
The application provides a data processing method, which comprises the following steps:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
The application provides a data processing method, which is applied to a data lake analysis system, wherein the data lake analysis system is used for providing a serverless data processing service for a user, and the method comprises the following steps:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
obtaining a target processing unit from a plurality of processing units of the data lake analysis system; the target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
acquiring the second data from the target processing unit and outputting the second data;
wherein the data source comprises a cloud database provided by the data lake analysis system.
The application provides a data processing method, which comprises the following steps:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring first data in a first input format from a data source according to the data processing request;
outputting the first data in the first input format to a target processing unit so that the target processing unit converts the first data into second data in a second output format;
and acquiring the second data from the target processing unit and outputting the second data.
The application provides a data processing method, which is applied to a data lake analysis system, and aims at a processing unit in a plurality of processing units of the data lake analysis system, wherein the processing unit comprises a plurality of different conversion information, and the different conversion information is used for realizing data conversion in different formats, and the method comprises the following steps:
the processing unit acquires first data in a first input format;
if the target conversion information of the processing unit is first conversion information and the first conversion information is used for realizing the conversion between the first input format and the second output format, converting the first data into second data in a second output format by using the first conversion information;
the processing unit outputs the second data.
The present application provides a data processing apparatus, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a data processing request which comprises a first input format and a second output format; acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
the processing module is used for acquiring first data in a first input format from a data source according to the data processing request and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
The present application provides a data processing apparatus comprising:
a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
Based on the above technical solution, in the embodiment of the present application, by setting the target conversion information of the target processing unit as the first conversion information, causing the target processing unit to convert the first data in the first input format into the second data in the second output format using the first conversion information, i.e., the conversion of the data format is performed by the target processing unit, which is usually implemented by a logic chip, which has a high processing performance, and therefore, can save the computing resources (such as CPU (Central Processing Unit) resources and the like) of the data lake analysis system, improve the overall Processing performance of the data lake analysis system, improve the overall use efficiency and experience of the data lake analysis system, data processing and computing performance can be accelerated, data interfacing of storage clusters is handled in combination with hardware acceleration techniques, and data interfaces are provided to computing clusters.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.
FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a data lake analysis system in one embodiment of the present application;
3A-3E are schematic diagrams of a data scanning cluster in one embodiment of the present application;
FIG. 4 is a schematic diagram of data format conversion in one embodiment of the present application;
FIGS. 5A and 5B are block diagrams of a data scanning cluster in an embodiment of the present application;
FIG. 6 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;
FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The embodiment of the present application provides a data processing method, which may be applied to any device, such as any device of a data lake analysis system, and as shown in fig. 1, is a flowchart of the method, and the method may include:
in step 101, a data processing request is obtained, where the data processing request includes a first input format (i.e., a format of data in a data source) and a second output format (i.e., a format of data to be output).
102, acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, the first conversion information is used for realizing conversion between a first input format and a second output format, and the target processing unit can convert data in the first input format into data in the second output format based on the first conversion information.
Optionally, in an example, the obtaining target processing unit may include, but is not limited to: a processing unit is arbitrarily selected from a plurality of processing units of the data lake analysis system, and the selected processing unit is taken as a target processing unit. Alternatively, target conversion information of a plurality of processing units of the data lake analysis system may be acquired, and a processing unit may be selected from the plurality of processing units using the target conversion information of each processing unit, with the selected processing unit being the target processing unit.
In one example, for each processing unit of the data lake analysis system, the processing unit can be: a processing unit that is not currently operating (i.e., the processing unit is not currently performing a conversion operation of data), or a processing unit that is currently operating (i.e., the processing unit is currently performing a conversion operation of data).
In one example, selecting a processing unit from the plurality of processing units using the target transformation information of each processing unit, and using the selected processing unit as the target processing unit, may include, but is not limited to: if a processing unit of which the target conversion information is first conversion information (used for realizing conversion between the first input format and the second output format) exists, determining the processing unit of which the target conversion information is the first conversion information as a target processing unit; alternatively, if there is no processing unit whose target conversion information is the first conversion information, a processing unit is arbitrarily selected from the plurality of processing units, and the selected processing unit is determined as the target processing unit.
Optionally, in an example, after the target processing unit is acquired, the method may further include, but is not limited to: if the target conversion information of the target processing unit is the first conversion information, keeping the target conversion information of the target processing unit unchanged according to the first input format and the second output format; or, if the target conversion information of the target processing unit is the second conversion information (the second conversion information is not used for realizing the conversion between the first input format and the second output format), modifying the target conversion information of the target processing unit into the first conversion information according to the first input format and the second output format.
Optionally, in one example, it can also be determined whether the data lake analysis system supports conversion of a first input format to a second output format for step 102. If so, i.e., the processing unit of the data lake analysis system supports the conversion of the first input format to the second output format, then the target processing unit is obtained from the plurality of processing units of the data lake analysis system. If not, that is, all processing units of the data lake analysis system do not support the conversion between the first input format and the second output format, the conventional process is adopted for processing.
Optionally, in an example, for step 102, the data processing request may further include a number of fragments, determine the number of target processing units according to the number of fragments, and obtain the number of target processing units.
Step 103, obtaining first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit, so that the target processing unit converts the first data into second data in a second output format using the first conversion information, which is not described again for this conversion process.
Step 104, obtaining the second data from the target processing unit and outputting the second data, for example, the second data may be output to the computing node, so that the computing node performs processing by using the second data.
In one example, the data processing request may further include a service mode, and if the service mode is a traffic mode, the total amount of data may be acquired, and virtual resource information (e.g., cost information) may be determined according to the total amount of data, and the virtual resource information may be output. Or, if the service mode is the instance mode, the number of the target processing units may be acquired, the virtual resource information may be determined according to the number of the target processing units, and the virtual resource information may be output.
In the above embodiment, the target processing unit includes a plurality of different conversion information, and the different conversion information is used for implementing data conversion of different formats; the target processing unit is implemented by a logic chip, which may include but is not limited to: FPGA (Field Programmable Gate Array), CPLD (complex Programmable Logic Device), ASIC (Application specific integrated Circuit), etc., without limitation.
In an example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Based on the technical scheme, in the embodiment of the application, the target processing unit is used for realizing the conversion of the data format, is usually realized by a logic chip, has high processing performance, can save the computing resources (such as CPU resources and the like) of the data lake analysis system, improves the overall processing performance of the data lake analysis system, improves the overall use efficiency and experience of the data lake analysis system, accelerates the data processing and computing performance, and provides a data interface for the computing cluster by combining the hardware acceleration technology to process the data docking of the storage cluster.
Based on the same application concept as the method, the embodiment of the present application further provides another data processing method, which can be applied to a data lake analysis system (e.g., a cloud computing platform in the data lake analysis system), where the data lake analysis system is configured to provide a serverless data processing service for a user, and the method includes:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format; and acquiring target processing units from a plurality of processing units of the data lake analysis system, wherein target conversion information of the target processing units is first conversion information, and the first conversion information is used for realizing conversion between a first input format and a second output format. Acquiring first data in a first input format from a data source according to a data processing request, and outputting the first data to a target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information; acquiring the second data from the target processing unit and outputting the second data; wherein the data source comprises a cloud database provided by a data lake analysis system.
The data source may include a cloud database provided by the data lake analysis system, and the cloud database may be used to provide a serverless query analysis service. The data lake analysis system can be a storage type cloud platform mainly based on data storage, or a computing type cloud platform mainly based on data processing, or a comprehensive cloud computing platform taking both computing and data storage processing into consideration, and the data lake analysis system is not limited.
The cloud database provided by the data lake analysis system can be used for providing a Serverless (Serverless) query analysis service for users, can analyze and query mass data in any dimension, and supports functions of high concurrency, low delay (millisecond response), real-time online analysis, mass data query and the like.
In one example, the data lake analysis system is embodied as: a data lake analysis system which is separated from the storage and calculation; the data lake analysis system includes a storage cluster including a plurality of data sources in different input formats and a computing cluster including a plurality of computing nodes in different output formats. Further, the data lake analysis system may further include a data scanning cluster, the data scanning cluster including a plurality of processing units; the data scanning cluster is used as a built-in module of the computing cluster and is deployed at the same node with the computing resources of the computing cluster; or the data scanning cluster is used as an independent module of the computing cluster and is deployed at different nodes with computing resources of the computing cluster; alternatively, the data scanning cluster is a separate cluster than the computing cluster.
Based on the same application concept as the method, the embodiment of the present application further provides a data processing method, which may include: obtaining a data processing request, wherein the data processing request can comprise a first input format and a second output format; acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format; second data is acquired from the target processing unit and output.
Based on the same application concept as the method, the embodiment of the present application further provides a data processing method applied to a data lake analysis system, where the data lake analysis system includes a plurality of processing units, and for each processing unit in the plurality of processing units, the processing unit includes a plurality of different conversion information, and the different conversion information is used for implementing data conversion in different formats, and the method includes:
the processing unit acquires first data in a first input format; if the target conversion information of the processing unit is first conversion information and the first conversion information is used for realizing the conversion between the first input format and the second output format, the processing unit converts the first data into second data of the second output format by using the first conversion information; the processing unit outputs the second data.
In one example, before the processing unit converts the first data into the second data in the second output format by using the first conversion information, if the target conversion information of the processing unit is not the first conversion information, the processing unit modifies the target conversion information of the processing unit into the converted information.
The data processing method is further described below with reference to specific application scenarios.
Referring to fig. 2, a schematic structural diagram of a Data Lake analysis (Data Lake Analytics) system is shown, where the Data Lake analysis system may include a client, a load balancing device, a front node (also referred to as a front-end server), a compute node (also referred to as a compute server), and a database, and of course, the Data Lake analysis system may also include other servers, which is not limited thereto.
In fig. 2, 3 front-end nodes are taken as an example, in practical applications, the number of the front-end nodes may also be other numbers, which is not limited. In fig. 2, 4 computing nodes are taken as an example, in practical applications, the number of the computing nodes may also be other numbers, which is not limited to this. Since the processing flow of each front-end node is the same, and the processing flow of each computing node is the same, for convenience of description, in the following embodiments, the processing flow of 1 front-end node is taken as an example, and the processing flow of 1 computing node is taken as an example.
In fig. 2, 5 databases are taken as an example, in practical applications, the number of databases may also be other numbers, which is not limited to this, and these databases are data sources. In this embodiment, the data sources may be heterogeneous data sources, that is, the databases may be the same type of database or different types of databases. These databases may be relational or non-relational.
Further, for each database, the type of this database may also include, but is not limited to: OSS (Object Storage Service), TableStore, HBase (Hadoop Database), HDFS (Hadoop Distributed File System), MySQL (Relational Database), RDS (Relational Database Service), DRDS (Distributed Relational Database Service), RDBMS (Relational Database Management System), Relational Database Management System, lserver (Relational Database), PostgreSQL (Object Relational Database), MongoDB (Distributed File Storage based Database), etc., although these are just a few examples of the types of databases and are not limiting on the types of databases.
The database is used to store various types of data, and the data types are not limited, for example, the data may be user data, commodity data, map data, video data, image data, audio data, and the like.
The client may be an APP (Application) included in a terminal device (e.g., a Personal Computer (PC), a notebook Computer, a mobile terminal, etc.), or may be a browser included in the terminal device, which is not limited thereto. The load balancing device is used for load balancing the data request of the client, and for example, after the data request is received, the load balancing device is used for load balancing the data request to each front-end node.
In one example, multiple front-end nodes may be used to provide the same functionality, forming a resource pool for the front-end nodes. And each front-end node in the resource pool is used for receiving the data request sent by the client, performing SQL (Structured Query Language) analysis on the data request, generating a plurality of execution plans according to the analysis result, and processing the execution plans. For example, the front-end node may send these execution plans to one or more computing nodes, which process the execution plans.
In one example, multiple compute nodes are used to provide the same functionality, forming a resource pool of compute nodes. For each computing node in the resource pool, if the computing node receives the execution plan sent by the front-end node, the computing node may process the execution plan and return the processing result to the front-end node.
In summary, the Data lake analysis system adopts a storage and computation separated architecture, and the computation node reads Data from different Data sources (Data sources), which are various types of databases.
In one example, the data lake analysis system is embodied as a storage and computation separated architecture, i.e., the data lake analysis system includes a storage cluster including a plurality of data sources (i.e., databases) in different input formats and a computation cluster including a plurality of compute nodes in different output formats. On this basis, in this embodiment of the present application, the data lake analysis system may further include a data scanning cluster, and the data scanning cluster may include a plurality of processing units, for example, a processing unit implemented by an FPGA, or the like.
Referring to fig. 3A, the data scanning cluster may be an independent module of the computing cluster, and the computing resources (e.g., CPU resources, etc.) of the computing cluster are deployed at different nodes, that is, the processing units of the data scanning cluster are deployed at the computing cluster, but the computing resources (e.g., CPU resources, etc.) of the computing cluster are deployed at different nodes. Specifically, in the data lake analysis system with separate storage and calculation, the data scanning cluster is used as a module in the calculation cluster, and is a functional module in the calculation cluster directly facing the storage cluster.
Referring to fig. 3B, the data scanning cluster may be a built-in module of the computing cluster, and is deployed at the same node as a computing resource (such as a CPU resource) of the computing cluster, that is, a processing unit of the data scanning cluster is deployed at the computing cluster, and is a built-in module of a computing node in the computing cluster, and is located at the same node as the CPU-based operator, and the computing task scheduling determines whether to enable the data scanning cluster to perform data format conversion, and if not, the CPU software module based on the computing node implements data format conversion.
Referring to fig. 3C, the data scanning cluster may be an independent cluster different from the computing cluster, and in the storage and computation separated data lake analysis system, the data scanning cluster is used as a functional module facing the computing cluster, and the data scanning cluster is used as a functional module facing the storage cluster. The data scanning cluster is a completely independent cluster on the cloud, can concurrently respond to data scanning requests of a plurality of different computing clusters on the cloud in a service form, runs completely independently, and has own cluster elastic management and expansion capacity.
For convenience of description, in the following embodiments, the data scanning cluster is taken as an independent cluster as an example.
In one example, a data lake analysis system can include a plurality of computing clusters, each computing cluster including a plurality of computing nodes. For each computing cluster, the computing cluster may be a computing cluster for SQL (Structured Query Language) computing, a computing cluster for machine Learning, or a computing cluster for Deep Learning (Deep Learning), which is not limited herein.
Specifically, referring to fig. 3D, these computing clusters may include, but are not limited to: presto-based computing clusters, Spark-based computing clusters, Hadoop-based computing clusters, Flink-based computing clusters, tensrflow-based computing clusters, PyTorch-based computing clusters, and the like.
For a Presto-based computing cluster, a data access interface adapted to Presto is provided, that is, the data output to the computing cluster is data matching the format of Presto data. For a Spark-based computing cluster, a Spark-adapted data access interface is provided, that is, the data output to the computing cluster is data that matches the Spark data format. For a Hadoop-based computing cluster, a Hadoop-adapted data access interface is provided, that is, the data output to the computing cluster is data matched with the Hadoop data format. For a Flink-based computing cluster, a data access interface adapted to the Flink is provided, i.e. the data output to the computing cluster is data matching the Flink data format. For a TensorFlow-based computing cluster, a TensorFlow-adapted data access interface is provided, that is, data output to the computing cluster is data matching the TensorFlow data format. For a PyTorch-based computing cluster, a data access interface adapted to PyTorch is provided, that is, data output to the computing cluster is data matching the PyTorch data format, and so on.
In one example, the data lake analysis system may include a storage cluster including a plurality of data sources, where the data sources may be databases, such as cloud databases, and the cloud databases are used to provide a Serverless (Serverless) query analysis service for users, enable analysis and query of mass data in any dimension, support high concurrency, low latency (millisecond response), real-time online analysis, mass data query, and the like.
In one example, these data sources may include, but are not limited to: OSS-based data sources, TableStore-based data sources, HBase-based data sources, HDFS-based data sources, MySQL-based data sources, RDS-based data sources, DRDS-based data sources, RDBMS-based data sources, PostgreSQL-based data sources, and the like. Of course, the above is merely an example, and no limitation is made thereto.
Referring to fig. 3E, due to different types of data sources, the data formats of the data in the data sources are different, for example, the data formats may include, but are not limited to: a partial data format, orc data format, text data format, json data format, kv data format, rcfile data format, avro data format, arrow data format, and the like. Of course, the above is only an example, and other data formats are possible, which is not limited to this.
In summary, since the data format of the data source is different from the data format of the computing cluster, the data format conversion is required so that the computing cluster can correctly process the data. For example, if the data format of the data source is json data format and the computing cluster is Presto-based computing cluster, the json data format data needs to be converted into data matching the Presto data format.
In the embodiment of the present application, the data format conversion is realized by providing the data scanning cluster, that is, the data format conversion is realized by a processing unit (such as an FPGA, etc.) in the data scanning cluster.
In one example, in order to implement the conversion of the data format, conversion information may be configured in a processing unit (such as an FPGA, etc.), and the processing unit may implement the conversion of the data format by using the conversion information, and the content of the conversion information is not limited as long as the processing unit can implement the conversion of the data format by using the conversion information.
For example, conversion information a1 is previously arranged in the processing unit, and based on the conversion information a1, the processing unit can convert data in the json data format into data matching the Presto data format.
In one example, a plurality of different conversion information may be configured at a processing unit (e.g., FPGA, etc.), and the different conversion information is used to implement data conversion in different formats. For example, conversion information a1, conversion information a2, conversion information A3, conversion information a4, and so on are configured at the processing unit. Based on the conversion information a1, the processing unit is able to convert the data in json data format into data matching the Presto data format. Based on the conversion information a2, the processing unit can convert the data in the json data format into data matching the Spark data format. Based on the conversion information a3, the processing unit can convert the data in the text data format into data that matches the Presto data format. Based on the conversion information a4, the processing unit can convert data in the text data format into data matching the Spark data format, and so on.
Of course, the above is only an example of the conversion information, and in practical applications, more conversion information may be configured in the processing unit to implement conversion of various data formats, as shown in fig. 4, which is a schematic diagram of data format conversion. The first column represents the data formats supported by the data source and the first row represents the data formats supported by the compute cluster. "yes" in fig. 4 indicates that the conversion of the two data formats is supported, and "no" in fig. 4 indicates that the conversion of the two data formats is not supported. Based on this, a plurality of conversion information may be configured at the processing unit to make the processing unit support conversion of "yes" corresponding two data formats by these conversion information.
In summary, since the processing unit is configured with a plurality of different conversion information, and the different conversion information is used for implementing data conversion of different formats, the computing power of the processing unit can be fully utilized, and the utilization rate of the processing unit can be improved. For example, if the processing unit configures the conversion information a1, the processing unit is configured to convert data in json data format into data matching the Presto data format. When there is no task of "converting data in json data format into data matching Presto data format", the processing unit is in an idle state, wasting the computing power of the processing unit. If the processing unit is configured with the conversion information a1 and the conversion information a2, the processing unit is configured to convert the data in the json data format into data matching the Presto data format, and convert the data in the json data format into data matching the Spark data format. When there is no task of converting the data in the json data format into the data matched with the Presto data format, the processing unit can also convert the data in the json data format into the data matched with the Spark data format, so that the processing unit is prevented from being in an idle state, and the computing capacity of the processing unit is improved.
In one example, the usage of the processing units in the data scanning cluster is relatively fixed for speeding up the data scanning tasks of different computing clusters. Referring to fig. 5A, the data scan cluster may include basic modules of instruction storage, data storage, constant storage, register sets, data storage linked lists, instruction execution, and the like. Further, as shown in fig. 5B, the data scanning cluster may further include a plurality of processing units (e.g., FPGAs, etc.), each processing unit is configured to perform conversion of different data formats, and the data scanning cluster may further include a scheduling and management module, an input module, an output module, and the like.
In the application scenario, referring to fig. 6, a flowchart of a data processing method provided in the embodiment of the present application may be applied to a data scanning cluster of a data lake analysis system, and the method may include:
step 601, a data processing request, such as a data scan (data scan) request, is obtained.
Specifically, the client may send a data processing request to the data lake analysis system through the load balancing device, so that the data scanning cluster of the data lake analysis system may obtain the data processing request. For example, the scheduling and management module of the data scanning cluster may obtain the data processing request.
Step 602, determine whether the data lake analysis system supports data format conversion corresponding to the data processing request. If yes, go to step 603; if not, the prompt does not support the data processing request.
Specifically, the data processing request may include an input data format (i.e., a format of data in the data source, and for convenience of distinction, a first input format is subsequently taken as an example, such as a json data format) and an output target format (i.e., a format of data to be output, and for convenience of distinction, a second output format is subsequently taken as an example, such as a Presto data format), so that it may be determined whether the data lake analysis system supports conversion between the first input format and the second output format, and if so, step 603 is performed; if not, the prompt does not support the data processing request.
For example, the scheduling and management module of the data scanning cluster may obtain the first input format and the second output format from the data processing request, and query whether the data lake analysis system supports the conversion of the first input format and the second output format. Specifically, assume that the data lake analysis system includes a capability registry and the capability registry is used to record the conversions of all data formats supported by the data lake analysis system, and the capability registry is shown with reference to fig. 4. If the first input format and/or the second output format do not exist in the capability registry, determining that the data lake analysis system does not support the conversion between the first input format and the second output format; if the first input format and the second output format exist in the capability registry and the correspondence between the first input format and the second output format is 'no', determining that the data lake analysis system does not support the conversion between the first input format and the second output format; and if the first input format and the second output format exist in the capability registry and correspond to 'yes', determining that the data lake analysis system supports the conversion of the first input format and the second output format.
Step 603, selecting a target processing unit from the plurality of processing units of the data lake analysis system.
Wherein, for each processing unit of the plurality of processing units, the processing unit may be: a processing unit that is not currently operating (i.e., the processing unit is not currently performing a conversion operation of data), or a processing unit that is currently operating (i.e., the processing unit is currently performing a conversion operation of data).
For example, a target processing unit may be selected from processing unit 1, processing unit 2, and processing unit 3, processing unit 1 may be a currently non-operating processing unit or a currently operating processing unit, processing unit 2 may be a currently non-operating processing unit or a currently operating processing unit, and so on.
In an example, the data processing request may further include a service mode, and if the service mode is a traffic mode, the service mode indicates that the user charges for the total amount of data, based on which, the user may share the processing unit with other users, so that the multiple processing units of the data lake analysis system may be processing units that are not currently operating or processing units that are currently operating, that is, the processing units that are not currently operating may be target processing units, or the processing units that are currently operating may be target processing units. If the service mode is an example mode, it indicates that the user charges the number of processing units, and based on this, the user uses the processing units alone, so that the processing units of the data lake analysis system may be processing units that are not currently operating, that is, the processing units that are not currently operating may be target processing units.
In one example, a processing unit can be arbitrarily selected from a plurality of processing units of the data lake analysis system, and the selected processing unit can be used as a target processing unit. Alternatively, target conversion information of a plurality of processing units of the data lake analysis system may be acquired, and a processing unit may be selected from the plurality of processing units using the target conversion information of each processing unit, and the selected processing unit may be taken as a target processing unit.
For example, assuming that the scheduling and management module of the data scanning cluster needs to select a target processing unit from processing unit 1, processing unit 2, and processing unit 3, the following manner is adopted: it is possible to randomly select a processing unit, such as the processing unit 1, from among these processing units, and to take the processing unit 1 as a target processing unit; alternatively, a processing unit, such as the processing unit 2, is selected from the processing units 1, 2, and 3 based on the target conversion information of these processing units, and the processing unit 2 is set as the target processing unit.
Wherein, selecting a processing unit from the plurality of processing units by using the target conversion information of each processing unit, and taking the selected processing unit as the target processing unit, may include but is not limited to: if there is a processing unit whose target conversion information is first conversion information (for realizing conversion of the first input format and the second output format), the processing unit whose target conversion information is the first conversion information may be determined as a target processing unit; alternatively, if there is no processing unit whose target conversion information is the first conversion information, a processing unit may be randomly selected from a plurality of processing units, and the selected processing unit may be determined as the target processing unit.
Wherein the target transition information is transition information that is currently enabled by the processing unit, i.e. transition information that is currently being used by the processing unit. For example, the processing unit configures conversion information a1 (for converting data in json data format into data matching Presto data format) and conversion information a2 (for converting data in json data format into data matching Spark data format), and if the target conversion information is conversion information a1, it indicates that the processing unit is currently used to convert data in json data format into data matching Presto data format, but is not used to convert data in json data format into data matching Spark data format. If the target conversion information is conversion information a2, it indicates that the processing unit is currently used to convert data in json data format to data matching Spark data format, and so on.
Assuming that the first input format is a json data format and the second output format is a Presto data format, the first conversion information is conversion information a1, that is, the first conversion information is used to realize conversion between the json data format and the Presto data format. If the target conversion information of processing unit 1 is conversion information a1, the target conversion information of processing unit 1 is the first conversion information, and processing unit 1 may be determined as the target processing unit.
In one example, the data processing request may further include a slice number indicating a number of processing units that the user needs to use, and thus, the number of target processing units may be further determined according to the slice number, and then the number of target processing units may be selected from the plurality of processing units of the data lake analysis system.
For example, if the number of slices is 5, that is, the number of processing units is 5, it is necessary to select 5 target processing units from the plurality of processing units of the data lake analysis system, and the specific selection manner is as in the above embodiment.
In step 604, target conversion information of a target processing unit (e.g., one or more target processing units, such as 5 target processing units) is set as first conversion information according to the first input format and the second output format.
The first conversion information is used to realize conversion between the first input format and the second output format, that is, to convert data in the first input format into data in the second output format.
Specifically, if the target conversion information of the target processing unit is the first conversion information, the target conversion information of the target processing unit is kept unchanged according to the first input format and the second output format; or, if the target conversion information of the target processing unit is the second conversion information (the second conversion information is not used for realizing the conversion between the first input format and the second output format), modifying the target conversion information of the target processing unit from the second conversion information to the first conversion information according to the first input format and the second output format.
For example, assuming that the first input format is a json data format and the second output format is a Presto data format, the first conversion information is conversion information a1, that is, the first conversion information is used to implement conversion between the json data format and the Presto data format. Further, if the target conversion information of the target processing unit is the conversion information a1, the target conversion information of the target processing unit may be kept unchanged, i.e., the target conversion information is still the conversion information a 1. If the target conversion information of the target processing unit is conversion information a2 (for implementing conversion between json data format and Spark data format), the target conversion information of the target processing unit may be modified to conversion information a1, so that the target processing unit is not used for implementing conversion between json data format and Spark data format, but is used for implementing conversion between json data format and Presto data format.
Wherein steps 601-604 may be performed by a scheduling and management module of the data scanning cluster.
In step 605, first data in a first input format (data in the data source may be referred to as first data) is obtained from the data source according to the data processing request, and the first data is output to the target processing unit.
Specifically, the data processing request may include information of a data source, and based on the information of the data source, the first data may be acquired from the data source, and a data format of the first data is a first input format, which is not described in detail again for this acquisition process. The first data in the first input format may then be output to the target processing unit.
For example, an input module of the data scanning cluster may obtain first data in a first input format from a data source and output the first data in the first input format to the target processing unit.
In step 606, the target processing unit converts the first data into second data in a second output format (the converted data is referred to as second data) by using the first conversion information, and the conversion process is not described again.
Specifically, referring to the above embodiment, the target conversion information of the target processing unit is the first conversion information, such as conversion information a1, and conversion information a1 is used to implement the conversion between the json data format and the Presto data format. Assuming that the first input format is a json data format and the second output format is a Presto data format, based on which the data format of the first data is the json data format, and the target processing unit is able to convert the first data in the json data format into the second data in the Presto data format using the conversion information a 1.
Step 607, obtain the second data of the second output format from the target processing unit, and output the second data.
For example, the output module of the data scanning cluster acquires second data in a second output format from the target processing unit, such as second data in the Presto data format, and outputs the second data in the Presto data format to a computing node, such as a computing node within the Presto-based computing cluster. Since the second data in the Presto data format is output to the compute node, the compute node can process using the second data.
In one example, the data processing request may further include a service mode, and if the service mode is a traffic mode (i.e. a shared service type), the service mode indicates that the user charges for the total amount of data, based on which, the user may share the processing unit with other users, so that the total amount of data (i.e. the total amount of data read from the data source) may be obtained, and virtual resource information (e.g. cost information) may be determined according to the total amount of data, and the virtual resource information may be output, e.g. output to the user. If the service mode is an instance mode (i.e. exclusive instance type), it indicates that the user charges the number of processing units, and based on this, the user uses the processing units alone, so it is possible to obtain the number of target processing units, determine virtual resource information (e.g. cost information) according to the number of target processing units, and output the virtual resource information, e.g. output the virtual resource information to the user.
In an example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Based on the technical scheme, in the embodiment of the application, the target processing unit is used for realizing the conversion of the data format, the target processing unit is usually realized by a logic chip, the high processing performance is achieved, the computing resources (such as CPU resources and the like) of the data lake analysis system are saved, the overall processing performance of the data lake analysis system is improved, the overall use efficiency and experience of the data lake analysis system are improved, the data processing and the computing performance are accelerated, and a data interface is provided for the computing cluster by combining the data docking of the hardware acceleration technology processing storage cluster. The data scanning cluster in the embodiment has better universality and productization application capability, the application range of the butt joint and accelerated computing cluster is greatly improved, the productization capability of a cloud product is greatly improved, the FPGA data scanning acceleration service of various modes is provided, a universal FPGA data scanning engine is provided, input and output supports of various data formats can be built in the universal FPGA data scanning engine, and a specific FPGA data scanning computing acceleration core is developed for a specific computing engine.
Based on the same application concept as the method, an embodiment of the present application further provides a data processing apparatus, as shown in fig. 7, which is a structural diagram of the data processing apparatus, and the data processing apparatus includes:
an obtaining module 71, configured to obtain a data processing request, where the data processing request includes a first input format and a second output format; acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
a processing module 72, configured to obtain first data in a first input format from a data source according to the data processing request, and output the first data to the target processing unit, so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
The obtaining module 71 is specifically configured to, when obtaining the target processing unit: target conversion information of a plurality of processing units of the data lake analysis system is acquired, and a processing unit is selected from the plurality of processing units as a target processing unit by using the target conversion information.
In one example, the processing module 72 is further configured to:
if the target conversion information of the target processing unit is the first conversion information, keeping the target conversion information of the target processing unit unchanged according to the first input format and the second output format; or,
and if the target conversion information of the target processing unit is the second conversion information, modifying the target conversion information of the target processing unit into the first conversion information according to the first input format and the second output format.
Based on the same application concept as the method, an embodiment of the present application further provides a data processing apparatus, including: a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
The embodiment of the application also provides a machine-readable storage medium, wherein a plurality of computer instructions are stored on the machine-readable storage medium; the computer instructions when executed perform the following:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
Referring to fig. 8, which is a structural diagram of a data processing device proposed in the embodiment of the present application, the data processing device 80 may include: processor 81, network interface 82, bus 83, and memory 84. The memory 84 may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the memory 84 may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., a compact disk, a dvd, etc.).
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (16)

1. A method of data processing, the method comprising:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
2. The method of claim 1,
the acquisition target processing unit includes:
target conversion information of a plurality of processing units of the data lake analysis system is acquired, and a processing unit is selected from the plurality of processing units as a target processing unit by using the target conversion information.
3. The method of claim 2, wherein said selecting a processing unit from the plurality of processing units as a target processing unit using the target translation information comprises:
if the processing unit with the target conversion information as the first conversion information exists, determining the processing unit with the target conversion information as the first conversion information as the target processing unit; or,
and if the processing unit of which the target conversion information is the first conversion information does not exist, selecting the processing unit from the plurality of processing units, and determining the selected processing unit as the target processing unit.
4. The method of claim 1,
after the acquiring the target processing unit, the method further comprises:
if the target conversion information of the target processing unit is the first conversion information, keeping the target conversion information of the target processing unit unchanged according to the first input format and the second output format; or,
and if the target conversion information of the target processing unit is the second conversion information, modifying the target conversion information of the target processing unit into the first conversion information according to the first input format and the second output format.
5. The method of claim 1,
the acquisition target processing unit includes:
judging whether the data lake analysis system supports the conversion between the first input format and the second output format;
if so, the target processing unit is obtained from the plurality of processing units of the data lake analysis system.
6. The method of claim 1,
the data processing request further includes a slice number, and the acquiring the target processing unit includes:
determining the number of target processing units according to the number of the fragments;
and acquiring the number of target processing units.
7. The method of claim 1, further comprising:
the data processing request also comprises a service mode, if the service mode is a flow mode, the total data amount is obtained, virtual resource information is determined according to the total data amount, and the virtual resource information is output;
and if the service mode is the instance mode, acquiring the number of target processing units, determining virtual resource information according to the number of the target processing units, and outputting the virtual resource information.
8. A data processing method is applied to a data lake analysis system, wherein the data lake analysis system is used for providing serverless data processing services for users, and the method comprises the following steps:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
obtaining a target processing unit from a plurality of processing units of the data lake analysis system; the target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
acquiring the second data from the target processing unit and outputting the second data;
wherein the data source comprises a cloud database provided by the data lake analysis system.
9. The method of claim 8,
the data lake analysis system specifically comprises: a data lake analysis system which is separated from the storage and calculation; the data lake analysis system comprises a storage cluster and a computing cluster, wherein the storage cluster comprises a plurality of data sources adopting different input formats, and the computing cluster comprises a plurality of computing nodes adopting different output formats;
the data lake analysis system further comprises a data scanning cluster, the data scanning cluster comprising a plurality of processing units; the data scanning cluster is used as a built-in module of the computing cluster and is deployed at the same node with computing resources of the computing cluster; or, the data scanning cluster is used as an independent module of the computing cluster, and is deployed at a different node from the computing resources of the computing cluster; alternatively, the data scanning cluster is a separate cluster than the computing cluster.
10. A method of data processing, the method comprising:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring first data in a first input format from a data source according to the data processing request;
outputting the first data in the first input format to a target processing unit so that the target processing unit converts the first data into second data in a second output format;
and acquiring the second data from the target processing unit and outputting the second data.
11. A data processing method is applied to a data lake analysis system, and aims at a processing unit in a plurality of processing units of the data lake analysis system, wherein the processing unit comprises a plurality of different conversion information, and the different conversion information is used for realizing data conversion in different formats, and the method comprises the following steps:
the processing unit acquires first data in a first input format;
if the target conversion information of the processing unit is first conversion information and the first conversion information is used for realizing the conversion between the first input format and the second output format, converting the first data into second data in the second output format by using the first conversion information;
the processing unit outputs the second data.
12. The method of claim 11, wherein prior to converting the first data into second data in a second output format using the first conversion information, the method further comprises:
and if the target conversion information of the processing unit is not the first conversion information, the processing unit modifies the target conversion information of the processing unit into the first conversion information.
13. A data processing apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a data processing request which comprises a first input format and a second output format; acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
the processing module is used for acquiring first data in a first input format from a data source according to the data processing request and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
14. The apparatus of claim 13,
the obtaining module is specifically configured to, when obtaining the target processing unit:
target conversion information of a plurality of processing units of the data lake analysis system is acquired, and a processing unit is selected from the plurality of processing units as a target processing unit by using the target conversion information.
15. The apparatus of claim 13, wherein the processing module is further configured to:
if the target conversion information of the target processing unit is the first conversion information, keeping the target conversion information of the target processing unit unchanged according to the first input format and the second output format; or,
and if the target conversion information of the target processing unit is the second conversion information, modifying the target conversion information of the target processing unit into the first conversion information according to the first input format and the second output format.
16. A data processing apparatus, characterized by comprising:
a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:
acquiring a data processing request, wherein the data processing request comprises a first input format and a second output format;
acquiring a target processing unit, wherein target conversion information of the target processing unit is first conversion information, and the first conversion information is used for realizing conversion between the first input format and the second output format;
acquiring first data in a first input format from a data source according to the data processing request, and outputting the first data to the target processing unit so that the target processing unit converts the first data into second data in a second output format by using the first conversion information;
and acquiring the second data from the target processing unit and outputting the second data.
CN201910312700.7A 2019-04-18 2019-04-18 Data processing method, device and equipment Pending CN111831713A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910312700.7A CN111831713A (en) 2019-04-18 2019-04-18 Data processing method, device and equipment
PCT/CN2020/084423 WO2020211717A1 (en) 2019-04-18 2020-04-13 Data processing method, apparatus and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312700.7A CN111831713A (en) 2019-04-18 2019-04-18 Data processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN111831713A true CN111831713A (en) 2020-10-27

Family

ID=72837041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312700.7A Pending CN111831713A (en) 2019-04-18 2019-04-18 Data processing method, device and equipment

Country Status (2)

Country Link
CN (1) CN111831713A (en)
WO (1) WO2020211717A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127226A (en) * 2021-03-12 2021-07-16 创业慧康科技股份有限公司 Method for generating data conversion model, data conversion method and device
CN113312242A (en) * 2021-06-29 2021-08-27 中国农业银行股份有限公司 Interface information management method, device, equipment and storage medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN114116842A (en) * 2021-11-25 2022-03-01 上海柯林布瑞信息技术有限公司 Multi-dimensional medical data real-time acquisition method and device, electronic equipment and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08149227A (en) * 1994-11-18 1996-06-07 Fujitsu Ltd Exchange, gate exchange and network
GB2527383A (en) * 2014-06-18 2015-12-23 Alfresco Software Inc Content transformation
WO2016127422A1 (en) * 2015-02-15 2016-08-18 华为技术有限公司 System, device and method for processing data
CN106161178A (en) * 2015-03-24 2016-11-23 阿里巴巴集团控股有限公司 A kind of method and apparatus accessing instant messaging network
CN107423334A (en) * 2017-04-24 2017-12-01 云宏信息科技股份有限公司 A kind of automatic data migration method and device for supporting multi-data source
CN107493176A (en) * 2017-09-25 2017-12-19 中国联合网络通信集团有限公司 A kind of charging method and system
CN107636655A (en) * 2015-08-28 2018-01-26 华为技术有限公司 Data are provided in real time to service(DaaS)System and method
CN108241722A (en) * 2016-12-23 2018-07-03 北京金山云网络技术有限公司 A kind of data processing system, method and device
US10027559B1 (en) * 2015-06-24 2018-07-17 Amazon Technologies, Inc. Customer defined bandwidth limitations in distributed systems
CN108363737A (en) * 2018-01-19 2018-08-03 深圳市宏电技术股份有限公司 A kind of conversion method of data format, device and equipment
WO2018153218A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Resource processing method, related apparatus and communication system
CN108694045A (en) * 2017-02-14 2018-10-23 北京国双科技有限公司 A kind of data processing method and device
CN109196865A (en) * 2017-03-27 2019-01-11 华为技术有限公司 A kind of data processing method and terminal
CN109343891A (en) * 2017-08-01 2019-02-15 阿里巴巴集团控股有限公司 System, the method and device of data processing
CN109413154A (en) * 2018-09-26 2019-03-01 平安普惠企业管理有限公司 Conversion method of data format, device, computer equipment and storage medium
US20190089801A1 (en) * 2017-09-18 2019-03-21 Thomson Licensing Method and device for transforming data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6407680B1 (en) * 2000-12-22 2002-06-18 Generic Media, Inc. Distributed on-demand media transcoding system and method
CN1913492A (en) * 2006-08-08 2007-02-14 恒生电子股份有限公司 Data exchange device, system and method
GB201615747D0 (en) * 2016-09-15 2016-11-02 Gb Gas Holdings Ltd System for data management in a large scale data repository
US10671631B2 (en) * 2016-10-31 2020-06-02 Informatica Llc Method, apparatus, and computer-readable medium for non-structured data profiling
US10540364B2 (en) * 2017-05-02 2020-01-21 Home Box Office, Inc. Data delivery architecture for transforming client response data

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08149227A (en) * 1994-11-18 1996-06-07 Fujitsu Ltd Exchange, gate exchange and network
GB2527383A (en) * 2014-06-18 2015-12-23 Alfresco Software Inc Content transformation
WO2016127422A1 (en) * 2015-02-15 2016-08-18 华为技术有限公司 System, device and method for processing data
CN106161178A (en) * 2015-03-24 2016-11-23 阿里巴巴集团控股有限公司 A kind of method and apparatus accessing instant messaging network
US10027559B1 (en) * 2015-06-24 2018-07-17 Amazon Technologies, Inc. Customer defined bandwidth limitations in distributed systems
CN107636655A (en) * 2015-08-28 2018-01-26 华为技术有限公司 Data are provided in real time to service(DaaS)System and method
CN108241722A (en) * 2016-12-23 2018-07-03 北京金山云网络技术有限公司 A kind of data processing system, method and device
CN108694045A (en) * 2017-02-14 2018-10-23 北京国双科技有限公司 A kind of data processing method and device
WO2018153218A1 (en) * 2017-02-27 2018-08-30 腾讯科技(深圳)有限公司 Resource processing method, related apparatus and communication system
CN109196865A (en) * 2017-03-27 2019-01-11 华为技术有限公司 A kind of data processing method and terminal
CN107423334A (en) * 2017-04-24 2017-12-01 云宏信息科技股份有限公司 A kind of automatic data migration method and device for supporting multi-data source
CN109343891A (en) * 2017-08-01 2019-02-15 阿里巴巴集团控股有限公司 System, the method and device of data processing
US20190089801A1 (en) * 2017-09-18 2019-03-21 Thomson Licensing Method and device for transforming data
CN107493176A (en) * 2017-09-25 2017-12-19 中国联合网络通信集团有限公司 A kind of charging method and system
CN108363737A (en) * 2018-01-19 2018-08-03 深圳市宏电技术股份有限公司 A kind of conversion method of data format, device and equipment
CN109413154A (en) * 2018-09-26 2019-03-01 平安普惠企业管理有限公司 Conversion method of data format, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
祝林编著: "《智能制造的探索与实践》", 30 November 2017, 西南交通大学出版社, pages: 140 - 141 *
陈红等: "云计算平台下的计费机制研究", 《计算机科学》, vol. 38, no. 8, 31 August 2011 (2011-08-31) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127226A (en) * 2021-03-12 2021-07-16 创业慧康科技股份有限公司 Method for generating data conversion model, data conversion method and device
CN113127226B (en) * 2021-03-12 2024-05-24 创业慧康科技股份有限公司 Method for generating data conversion model, data conversion method and device
CN113312242A (en) * 2021-06-29 2021-08-27 中国农业银行股份有限公司 Interface information management method, device, equipment and storage medium
CN113312242B (en) * 2021-06-29 2024-05-17 中国农业银行股份有限公司 Interface information management method, device, equipment and storage medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN113568938B (en) * 2021-08-04 2023-11-14 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN114116842A (en) * 2021-11-25 2022-03-01 上海柯林布瑞信息技术有限公司 Multi-dimensional medical data real-time acquisition method and device, electronic equipment and storage medium
CN114116842B (en) * 2021-11-25 2023-05-19 上海柯林布瑞信息技术有限公司 Multidimensional medical data real-time acquisition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2020211717A1 (en) 2020-10-22

Similar Documents

Publication Publication Date Title
CN111831713A (en) Data processing method, device and equipment
US9965209B2 (en) Large-scale, dynamic graph storage and processing system
CN111258978B (en) Data storage method
CN112527848B (en) Report data query method, device and system based on multiple data sources and storage medium
US9712612B2 (en) Method for improving mobile network performance via ad-hoc peer-to-peer request partitioning
JP2016515228A (en) Data stream splitting for low latency data access
WO2016058488A1 (en) Method and device for providing sdk files
US10866960B2 (en) Dynamic execution of ETL jobs without metadata repository
CN108363741B (en) Big data unified interface method, device, equipment and storage medium
CN111026493B (en) Interface rendering processing method and device
US20210011847A1 (en) Optimized sorting of variable-length records
CN111949856A (en) Object storage query method and device based on web
CN110781159B (en) Ceph directory file information reading method and device, server and storage medium
CN111400301B (en) Data query method, device and equipment
CN110866052A (en) Data analysis method, device and equipment
CN112307061A (en) Method and device for querying data
JP2021508867A (en) Systems, methods and equipment for querying databases
CN108319604B (en) Optimization method for association of large and small tables in hive
CN112506887A (en) Vehicle terminal CAN bus data processing method and device
CN110909072B (en) Data table establishment method, device and equipment
WO2023071566A1 (en) Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
WO2020211718A1 (en) Data processing method, apparatus and device
CN110928895A (en) Data query method, data table establishing method, device and equipment
US20180131756A1 (en) Method and system for affinity load balancing
CN111061557A (en) Method and device for balancing distributed memory database load

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination