CN116049155A - Method, device and storage medium for processing data - Google Patents

Method, device and storage medium for processing data Download PDF

Info

Publication number
CN116049155A
CN116049155A CN202211699078.8A CN202211699078A CN116049155A CN 116049155 A CN116049155 A CN 116049155A CN 202211699078 A CN202211699078 A CN 202211699078A CN 116049155 A CN116049155 A CN 116049155A
Authority
CN
China
Prior art keywords
data
data table
monitoring rule
quality monitoring
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211699078.8A
Other languages
Chinese (zh)
Inventor
齐普军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huawei Cloud Computing Technology Co ltd
Original Assignee
Shenzhen Huawei Cloud Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huawei Cloud Computing Technology Co ltd filed Critical Shenzhen Huawei Cloud Computing Technology Co ltd
Priority to CN202211699078.8A priority Critical patent/CN116049155A/en
Publication of CN116049155A publication Critical patent/CN116049155A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a storage medium for processing data, and belongs to the field of computers. The method comprises the following steps: receiving a data conversion request, wherein the data conversion request comprises identification information of a first data table, identification information of a second data table and a first quality monitoring rule; reading a plurality of first data stored in the first data table based on the identification information of the first data table; determining first data conforming to the first quality monitoring rule in the plurality of first data; converting the first data conforming to the first quality monitoring rule into at least one second data; and storing the at least one second data into the second data table based on the identification information of the second data table. The method and the device can avoid the quality problem in the source data table from being transferred to the destination data table when the source data table is converted into the destination data table.

Description

Method, device and storage medium for processing data
Technical Field
The present invention relates to the field of computers, and in particular, to a method, an apparatus, and a storage medium for processing data.
Background
Database systems often include multiple data tables, each for storing data. One data table in the database system may be converted from another data table in the database system. For example, the first data table is a conversion of a second data table, the second data table is a conversion of a third data table, and the first data table, the second data table, and the third data table are different data tables in the database system.
The data in the third data table may have a quality problem, and the second data table converted from the third data table may also have a quality problem, and the first data table converted from the second data table may also have a quality problem. The second data table is converted from the third data table, so the third data table can be called a source data table, and the second data table can be called a destination data table. The first data table is converted from a second data table, the second data table can be called a source data table, and the first data table can be called a destination data table.
So at present, when converting a source data table into a destination data table, there is a quality problem in the source data table, and the quality problem is transferred to the destination data table along with the conversion.
Disclosure of Invention
The application provides a method, a device and a storage medium for processing data, so as to avoid the quality problem in a source data table from being transferred into a destination data table when the source data table is converted into the destination data table. The technical scheme is as follows:
in a first aspect, the present application provides a method of processing data, in which a data conversion request is received, the data conversion request including identification information of a first data table, identification information of a second data table, and a first quality monitoring rule. And reading a plurality of first data stored in the first data table based on the identification information of the first data table. First data of the plurality of first data that meets a first quality monitoring rule is determined. The first data meeting the first quality monitoring rule is converted into at least one second data. And storing the at least one second data into the second data table based on the identification information of the second data table.
Because the data conversion request comprises the first quality monitoring rule, based on the identification information of the first data table, a plurality of first data stored in the first data table are read, the first data conforming to the first quality monitoring rule in the plurality of first data are determined, the first data conforming to the first quality monitoring rule are converted into at least one second data, the first data conforming to the first quality monitoring rule are converted, and therefore the first data with quality problems in the first data table can be excluded, and the first data without quality problems in the first data table are converted. And then, based on the identification information of the second data table, saving the at least one second data in the second data table, so that the first data with quality problems in the first data table can be prevented from being transferred to the second data table along with conversion.
In one possible implementation, the data conversion request further includes a second quality monitoring rule, and the second data conforming to the second quality monitoring rule in the at least one second data is determined. And storing the second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
Since the at least one second data is obtained by converting a plurality of first data in the first data table, determining the second data which accords with the second quality monitoring rule in the at least one second data, and storing the second data which accords with the second quality monitoring rule in the second data table, the second data which has quality problems in the conversion process is prevented from being stored in the second data table.
In another possible implementation, the data conversion request includes a data read statement that includes a first quality monitoring rule and a data write statement that includes a second quality monitoring rule.
If the quality of data in the first data table is detected separately, it is necessary to read a plurality of first data in the first data table from the first data table separately, and the plurality of first data is detected using the quality monitoring rule. Because in the application, the data reading statement comprises a first quality monitoring rule and the data writing statement comprises a second quality monitoring rule, when the data of the first data table is read based on the data reading statement, the first quality monitoring rule is used for determining first data conforming to the first quality monitoring rule in the first data table, and before the second data is written into the second data table based on the data writing statement, the second quality monitoring rule is used for determining second data conforming to the second quality monitoring rule in the second data to be saved. In the process of converting the first data table into the second data table, since a plurality of first data are required to be read from the first data table in the process, and data conforming to the quality monitoring rule is determined from the plurality of first data, the plurality of first data in the first data table are not required to be read once when the data quality in the first data table is detected singly, and therefore the calculation resource can be saved.
In another possible implementation, the data conversion request further includes first indication information, where the first indication information is used to indicate a location of the first quality-monitoring rule in the data conversion request. A first quality monitoring rule is read from the data conversion request based on the location indicated by the first indication information. Thus, the first quality monitoring rule can be accurately obtained based on the first indication information.
In a second aspect, the present application provides an apparatus for processing data for performing the method of the first aspect or any one of the possible implementations of the first aspect. In particular, the apparatus comprises means for performing the method of the first aspect or any one of the possible implementations of the first aspect.
In a third aspect, the present application provides an apparatus for processing data, comprising at least one processor and a memory, the at least one processor being configured to couple with the memory, read and execute instructions in the memory, to implement the method of the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer program product comprising a computer program stored in a computer readable storage medium and loaded by a processor to implement the method of the first aspect or any possible implementation of the first aspect.
In a fifth aspect, the present application provides a computer readable storage medium storing a computer program to be loaded by a processor for performing the method of the first aspect or any possible implementation of the first aspect.
In a sixth aspect, the present application provides a chip comprising a memory for storing computer instructions and a processor for calling and executing the computer instructions from the memory to perform the method of the first aspect or any possible implementation manner of the first aspect.
Drawings
Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another network architecture provided by embodiments of the present application;
FIG. 3 is a flow chart of a method for processing data according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an input interface provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for processing data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a computing device provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of another computing device cluster provided in an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides a network architecture 100, the network architecture 100 including a compute engine 101 and a database system 102, the compute engine 101 in communication with the database system 102.
Database system 102 includes a plurality of data tables, each for holding data.
The compute engine 101 may send a data access request to the database system 102, based on which the database system 102 accesses a data table in the database system 102.
For example, the compute engine 101 sends a data read request to the database system 102, the data read request including identification information of the data table to be read. Database system 102 includes a table of data to be read.
The database system 102 receives the data read request, reads data stored in the data table to be read in the database system 102 based on the identification information of the data table to be read included in the data read request, and sends a data read response to the calculation engine 101, the data read response including the read data. The calculation engine 101 receives the data read response.
For another example, the compute engine 101 sends a data write request to the database system 102, the data write request including identification information of the data table to be written and the data to be written. Database system 102 includes a table of data to be written.
The database system 102 receives a data writing request, stores data to be written in the data table to be written in the database system 102 based on the identification information of the data table to be written included in the data writing request, and sends a data writing response to the calculation engine 101, wherein the data writing response includes indication information for identifying that writing is successful. The calculation engine 101 receives the data write response.
Also for example, the calculation engine 101 sends a data conversion request to the database system 102, the data conversion request including identification information of the first data table and identification information of the second data table. Database system 102 includes a first data table and a second data table.
The database system 102 receives a data conversion request, and reads a plurality of first data stored in a first data table in the database system 102 based on identification information of the first data table included in the data conversion request; converting the plurality of first data into a plurality of second data, and storing the plurality of second data into a second data table in the database system 102 based on the identification information of the second data table; a data conversion response is sent to the calculation engine 101, the data conversion response including indication information for identifying that the conversion was successful. The calculation engine 101 receives the data conversion response.
In some embodiments, the number of compute engines 101 in the network architecture 100 is at least one. For example, referring to FIG. 2, the network architecture 100 includes two compute engines 101, the two compute engines 101 being Spark and Hive.
In some embodiments, database system 102 includes at least one script that includes, for any script, identification information for a source data table, identification information for a destination data table, and implementation code to implement a conversion operation. The realization code in the script is used for realizing the conversion operation, and the source data table corresponding to the identification information of the source data table is converted into the destination data table corresponding to the identification information of the destination data table through the conversion operation.
For example, the database system includes a first script, a second script, and the like, where the first script includes identification information of a source data table as identification information of a first data table, and the first script includes identification information of a destination data table as identification information of a second data table, and the first script further includes implementation code for implementing a first conversion operation, where the first script is used for implementing the first conversion operation. The second script further comprises an implementation code for implementing the second conversion operation, and the second script is used for implementing the second conversion operation.
For any script in the database system, the script includes identification information of at least one source data table, the script being for indicating a conversion of the at least one source data table into a destination data table.
In this way, the database system 102 receives the data conversion request, uses the identification information of the first data table included in the data conversion request as the identification information of the source data table, uses the identification information of the second data table included in the data conversion request as the identification information of the destination data table, acquires the first script including the identification information of the first data table and the identification information of the second data table, converts the plurality of first data stored in the first data table into the plurality of second data using the implementation code included in the first script, and stores the plurality of second data in the second data table.
In order to avoid saving the second data with quality problems into the second data table, the quality problems in the source data table can be avoided from being transferred into the destination data table when the source data table is converted into the destination data table by any embodiment as follows.
Referring to fig. 3, an embodiment of the present application provides a method 300 for processing data, where the method 300 is applied to the network architecture 100 shown in fig. 1 or fig. 2, and the method 300 includes the following steps 301-305.
Step 301: the database system receives a data conversion request including identification information of a first data table, identification information of a second data table, and a first quality monitoring rule.
The computing engine sends a data conversion request to the database system requesting the database system to convert the plurality of first data stored in the first data table to a plurality of second data, and store the plurality of second data in the second data table.
In some embodiments, the data conversion request further includes first indication information indicating a location of the first quality-monitoring rule in the data conversion request. Thus, after the database system receives the data conversion request, the first indication information is identified from the data conversion request, the position of the first quality monitoring rule in the data conversion request is determined based on the first indication information, and the first quality monitoring rule is acquired from the data conversion request based on the position.
In some embodiments, the data conversion request includes a plurality of data access statements including a data read statement including identification information of the first data table and the first quality monitoring rule and a data write statement including identification information of the second data table.
The data read statement is for instructing the database system to read data from the first data table and the data write statement is for instructing the database system to write data to the second data table.
In some embodiments, the data read statement further includes first indication information for indicating a location of the first quality monitoring rule in the data read statement.
In some embodiments, the plurality of data access statements are structured query language (structured query language, SQL) statements. When a user needs to convert the data in the first data table into the data in the second data table, a plurality of SQL sentences can be input in a computing engine, the computing engine obtains the plurality of SQL sentences input by the user, and a data conversion request is sent to a database system, wherein the data conversion request comprises the plurality of SQL sentences.
For example, referring to FIG. 4, the compute engine displays an input interface in which a user can input a plurality of SQL statements from which the compute engine reads.
Assuming that the data conversion request includes a plurality of SQL statements, the data read statement includes three, the first one being: "10", "id" AS "id"/+valid not_null, unique/. The second strip is: "10", "name" AS "name"/+ validization int [ xx ], id_card "/. The third bar is: SELECT "id", "name" FROM "table1".
The data read statement includes identification information "table1" of the first data table, and first instruction information "validation". The first data read statement includes a first quality monitoring rule "not_null, unique" located after the first indication information "valid", where the first quality monitoring rule "not_null, unique" is used to indicate that the id column in the data table1 cannot be empty and that each data in the id column is unique.
The second data reading sentence includes a first quality monitoring rule "int [ xx ], id_card", which is located after the first indication information "identification", and the first quality monitoring rule "int [ xx ], id_card" is used to indicate that each data in the name column in the data table1 is an integer of length xx bits.
The data write statement includes JOIN (SELECT "id", "name" FROM "table 2") AS "r1" ON "10.id" = "r1.id". The data write statement includes identification information "table2" of the second data table.
SELECT
“10”,“id”AS“id”/*+validation not_null,unique*/,
“10”,“name”AS“name”/*+validation int[xx],id_card*/,
“r1”,“id”AS“id1”,
“r1”,“name”AS“name1”,
FROM
(SELECT“id”,“name”FROM“table1”)AS“10”
JOIN(SELECT“id”,“name”FROM“table2”)AS“r1”ON“10.id”=“r1.id”。
In some embodiments, the data write statement further includes a second quality monitoring rule. Optionally, the data writing statement further includes second indication information, where the second indication information is used to indicate a location of the second quality monitoring rule in the data writing statement.
The first quality monitoring rule and/or the second quality monitoring rule is used for monitoring the integrity, validity, consistency, timeliness and/or accuracy of the data, etc.
Step 302: the database system reads a plurality of first data stored in the first data table based on the identification information of the first data table.
The data reading statement includes identification information of the first data table, and the database system executes the data reading statement to enable the database system to read a plurality of first data stored in the first data table based on the identification information of the first data table.
In some embodiments, the data conversion request includes at least one data read statement, each data read statement for reading at least one column of data from the first data table. For any one of the data read statements, the data read statement includes identification information for at least one column of data in the first data table, the database system executes the data read statement to read the at least one column of data from the first data table based on the identification information for the at least one column of data included in the data read statement.
Step 303: the database system determines first data of the plurality of first data that meets a first quality monitoring rule.
In some embodiments, for any one data read statement, the data read statement includes identification information, first indication information, and first quality monitoring rules for at least one column of data in the first data table. The database system identifies first indication information from the data read statement, and reads a first quality monitoring rule from the data read statement based on the first indication information.
In this way, the database system, after reading the at least one column of data from the first data table based on the identification information of the at least one column of data included in the data reading sentence, determines, based on the read first quality monitoring rule, first data conforming to the first quality monitoring rule in the at least one column of data.
For example, for a data read statement: "10", "id" AS "id"/+valid no_null, unique ", the data read statement includes identification information" id ", first indication information" valid ", and first quality monitoring rule" no_null, unique "of a column of data in the first data table" table1". The database system identifies first indication information "validation" from the data read statement, and reads a first quality monitoring rule "not_null, unique" from the data read statement based on the first indication information "validation".
In this way, the database system reads a column of data included in the data read statement from the first data table "table1" based on the identification information "id" of the column of data, each row of data in the column of data being an id. Based on the read first quality monitoring rule "not_null", each id in the list of data that meets the first quality monitoring rule is determined. Wherein for each id that meets the first quality monitoring rule, the each id is different from each other.
Step 304: the database system converts the first data that meets the first quality monitoring rule into at least one second data.
In some embodiments, the data conversion request further includes a data write statement further including identification information of the second data table, the database system further including at least one script. For any one script, the script includes identification information of a source data table, identification information of a destination data table, and implementation code for implementing a conversion operation. The script is used for realizing the conversion operation, and converting the source data table corresponding to the identification information of the source data table into the destination data table corresponding to the identification information of the destination data table through the conversion operation.
In step 304, the database system obtains a script including the identification information of the source data table and the identification information of the destination data table from the at least one script, using the identification information of the first data table as the identification information of the source data table and the identification information of the second data table as the identification information of the destination data table. The conversion operation is determined based on the obtained script including implementation code for implementing the conversion operation by which the first data conforming to the first quality monitoring rule is converted into at least one second data.
In some embodiments, the conversion operation may be case-to-case conversion, ordering, and/or data stitching, among others.
Step 305: the database system stores the at least one second data table in the second data table based on the identification information of the second data table.
In some embodiments, the data conversion request further includes a data write statement further including identification information of a second data table, the database system executing the data write statement to effect saving of the at least one second data into the second data table based on the identification information of the second data table included in the data write statement.
In some embodiments, the data write statement further includes a second quality monitoring rule, determining second data of the at least one second data that meets the second quality monitoring rule; and storing the second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
The second data which does not accord with the second quality monitoring rule is the data with quality problems generated in the conversion process, so the second data which accords with the second quality monitoring rule is stored in the second data table, and the second data with quality problems in the second data table is avoided.
In some embodiments, the data write statement further includes second indication information, the database system identifies the second indication information from the data write statement, obtains a location of a second quality-monitoring rule in the data write statement based on the second indication information, and obtains the second quality-monitoring rule from the data write statement based on the obtained location. Then, the database system determines second data conforming to a second quality monitoring rule in the at least one second data; and storing the second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
In an embodiment of the present application, a database system receives a data conversion request, where the data conversion request includes identification information of a first data table, identification information of a second data table, and a first quality monitoring rule. And reading a plurality of first data stored in the first data table based on the identification information of the first data table. First data of the plurality of first data that meets a first quality monitoring rule is determined. The first data meeting the first quality monitoring rule is converted into at least one second data. The first data meeting the first quality monitoring rule is converted, so that the first data with quality problems in the first data table can be excluded, and the first data without quality problems in the first data table can be converted. And then the database system stores the at least one second data into the second data table based on the identification information of the second data table, so that the first data with quality problems in the first data table can be prevented from being transferred into the second data table along with the conversion. In addition, the database system further determines second data conforming to a second quality monitoring rule in the at least one second data before saving the at least one second data to the second data table, and saves the second data conforming to the second quality monitoring rule to the second data table. The database system determines the second data which accords with the second quality monitoring rule in the at least one second data, and stores the second data which accords with the second quality monitoring rule into a second data table to avoid the second data with quality problems in the second data table. And because the data conversion request comprises the first quality monitoring rule, the database system reads the first data from the first data table after receiving the data conversion request, determines the first data conforming to the first quality monitoring rule in the read first data, and then converts the first data conforming to the first quality monitoring rule into the second data, after reading the first data from the first data table once, the first data which has no quality problem in the first data table is determined by using the first quality monitoring rule, and the first data table is converted into the second data table, so that the calculation resources are saved.
Referring to fig. 5, an embodiment of the present application provides an apparatus 500 for processing data, where the apparatus 500 is deployed on the database system 102 in the network architecture 100 shown in fig. 1 or fig. 2, or where the apparatus 500 is deployed on the database system of the method 300 shown in fig. 3. The apparatus 500 includes:
a receiving unit 501, configured to receive a data conversion request, where the data conversion request includes identification information of a first data table, identification information of a second data table, and a first quality monitoring rule;
a processing unit 502, configured to read a plurality of first data stored in the first data table based on the identification information of the first data table;
the processing unit 502 is further configured to determine first data, which meets a first quality monitoring rule, from the plurality of first data;
the processing unit 502 is further configured to convert the first data according to the first quality monitoring rule into at least one second data;
the processing unit 502 is further configured to store at least one second data in the second data table based on the identification information of the second data table.
Optionally, the detailed implementation procedure of the receiving unit 501 for receiving the data conversion request refers to the relevant content of step 301 of the method 300 shown in fig. 3, which is not described in detail here.
Optionally, the detailed implementation process of the processing unit 502 for reading the plurality of first data stored in the first data table refers to the relevant content of step 302 of the method 300 shown in fig. 3, which is not described in detail herein.
Optionally, the detailed implementation process of the processing unit 502 for determining the first data meeting the first quality monitoring rule in the plurality of first data refers to the relevant content of step 303 of the method 300 shown in fig. 3, which is not described in detail herein.
Optionally, the detailed implementation process of the processing unit 502 for converting the first data according to the first quality monitoring rule into the at least one second data is referred to in the relevant content of step 304 of the method 300 shown in fig. 3, which is not described in detail here.
Optionally, the detailed implementation process of saving at least one second data to the second data table by the processing unit 502 is referred to in the relevant content of step 304 of the method 300 shown in fig. 3, and will not be described in detail here.
Optionally, the data conversion request further includes a second quality monitoring rule, and the processing unit 502 is configured to:
determining second data conforming to a second quality monitoring rule in the at least one second data;
and storing the second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
Optionally, the detailed implementation process of the determining, by the processing unit 502, the second data meeting the second quality monitoring rule in the at least one second data is referred to as relevant content of step 305 of the method 300 shown in fig. 3, and will not be described in detail herein.
Optionally, the processing unit 502 stores the second data meeting the second quality monitoring rule in the second data table, which is related to the step 305 of the method 300 shown in fig. 3, and will not be described in detail herein.
Optionally, the data conversion request includes a data read statement including a first quality monitoring rule and a data write statement including a second quality monitoring rule.
Optionally, the data conversion request further comprises first indication information for indicating a position of the first quality monitoring rule in the data conversion request,
the processing unit 502 is further configured to read the first quality monitoring rule from the data conversion request based on the location indicated by the first indication information.
Optionally, the detailed implementation procedure of the first quality monitoring rule read by the processing unit 502 from the data conversion request is referred to in the relevant content of step 301 of the method 300 shown in fig. 3, and will not be described in detail here.
The receiving unit 501 and the processing unit 502 may be implemented by software, or may be implemented by hardware. By way of example, the processing unit 502 is next presented as an implementation of the processing unit 502. Similarly, the implementation of the receiving unit 501 may refer to the implementation of the processing unit 502.
Unit as an example of a software functional unit, the processing unit 502 may include code that runs on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, the processing unit 502 may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (availability zone, AZ) or may be distributed in different AZs, each AZ comprising a data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (virtual private cloud, VPC) or in multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.
Unit as an example of a hardware functional unit, the processing unit 502 may include at least one computing device, such as a server or the like. Alternatively, the processing unit 502 may be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD), etc. The PLD may be implemented as a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof.
The processing unit 502 may include multiple computing devices distributed in the same region or in different regions. The plurality of computing devices comprised by the processing unit 502 may be distributed in the same AZ or may be distributed in different AZ. Likewise, multiple computing devices included in processing unit 502 may be distributed across the same VPC or across multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, in other embodiments, the processing unit 502 may be configured to perform any step in the method provided in any of the foregoing embodiments, and the receiving unit 501 may be configured to perform any receiving step in the method provided in any of the foregoing embodiments. The steps that the processing unit 502 and the receiving unit 501 are responsible for implementing may be specified as needed, and the processing unit 502 and the receiving unit 501 implement different steps in the method provided in any of the foregoing embodiments to implement all the functions of the apparatus 500 for processing data.
In this embodiment of the present application, since the data conversion request received by the receiving unit includes a first quality monitoring rule, the processing unit reads, based on identification information of the first data table, a plurality of first data stored in the first data table, determines first data conforming to the first quality monitoring rule among the plurality of first data, converts the first data conforming to the first quality monitoring rule into at least one second data, and converts the first data conforming to the first quality monitoring rule, so that the processing unit may exclude the first data having a quality problem in the first data table, and convert the first data having no quality problem in the first data table. And then the processing unit stores the at least one second data into the second data table based on the identification information of the second data table, so that the first data with quality problems in the first data table can be prevented from being transferred into the second data table along with the conversion.
Referring to fig. 6, an embodiment of the present application provides a computing device 600. As shown in fig. 6, the computing device 600 may be a database system in the network architecture 100 shown in fig. 1 or fig. 2, or may be a database system of the method 300 shown in fig. 3. The computing device 600 includes: bus 602, processor 604, memory 606, and communication interface 608. The processor 604, the memory 606, and the communication interface 608 communicate via the bus 602. Computing device 600 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 600.
Bus 602 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 6, but not only one bus or one type of bus. Bus 604 may include a path to transfer information between various components of computing device 600 (e.g., memory 606, processor 604, communication interface 608).
The processor 604 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 606 may include volatile memory (RAM), such as random access memory (random access memory). The processor 604 may also include non-volatile memory (ROM), such as read-only memory (ROM), flash memory, a mechanical hard disk (HDD), or a solid state disk (solid state drive, SSD).
Referring to fig. 6, a memory 606 stores executable program codes, and a processor 604 executes the executable program codes to implement the functions of the receiving unit 501 and the processing unit 502 in the apparatus 500 shown in the figures, respectively, so as to implement the method provided in any of the above embodiments. That is, the memory 606 has instructions stored thereon for performing the methods provided by any of the embodiments described above. Or alternatively, the process may be performed,
the communication interface 603 enables communication between the computing device 600 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 7, the cluster of computing devices includes at least one computing device 600. The same instructions for performing the methods provided by any of the embodiments described above may be stored in memory 606 in one or more computing devices 600 in a cluster of computing devices.
In some possible implementations, the memory 606 of one or more computing devices 600 in the computing device cluster may also each have stored therein a portion of instructions for performing the method of data management described above. In other words, a combination of one or more computing devices 600 may collectively execute instructions for performing the methods provided by any of the embodiments described above.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 8 shows one possible implementation. As shown in fig. 8, two computing devices 600A and 600B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device.
In this possible implementation, instructions to perform the functions of processing unit 502 in the embodiment shown in FIG. 5 are stored in memory 606 in computing device 600A. Meanwhile, the memory 606 in the computing device 600B has stored therein instructions for performing the functions of the receiving unit 501 in the embodiment shown in fig. 5.
It should be appreciated that the functionality of computing device 600A shown in fig. 8 may also be performed by multiple computing devices 600. Likewise, the functionality of computing device 600B may also be performed by multiple computing devices 600.
The embodiment of the application also provides another computing device cluster. The connection between computing devices in the computing device cluster may be similar to the connection of the computing device cluster described with reference to fig. 8. In contrast, the same instructions for performing the methods provided by any of the embodiments described above may be stored in memory 606 in one or more computing devices 600 in the computing device cluster.
In some possible implementations, part of the instructions for performing the methods provided by any of the embodiments described above may also be stored in the memory 606 of one or more computing devices 600 in the computing device cluster, respectively. In other words, a combination of one or more computing devices 600 may collectively execute instructions for performing the methods provided by any of the embodiments described above.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be software or a program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform the method provided by any of the embodiments described above.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform the method provided by any of the embodiments described above.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments is merely illustrative of the principles of the present application, and not in limitation thereof, and any modifications, equivalents, improvements and/or the like may be made without departing from the spirit and scope of the present application.

Claims (11)

1. A method of processing data, the method comprising:
receiving a data conversion request, wherein the data conversion request comprises identification information of a first data table, identification information of a second data table and a first quality monitoring rule;
reading a plurality of first data stored in the first data table based on the identification information of the first data table;
determining first data conforming to the first quality monitoring rule in the plurality of first data;
converting the first data conforming to the first quality monitoring rule into at least one second data;
and storing the at least one second data into the second data table based on the identification information of the second data table.
2. The method of claim 1, wherein the data conversion request further comprises a second quality monitoring rule, the method further comprising:
determining second data conforming to the second quality monitoring rule in the at least one second data;
the storing the at least one second data into the second data table based on the identification information of the second data table includes:
and storing second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
3. The method of claim 2, wherein the data conversion request includes a data read statement and a data write statement, the data read statement including the first quality monitoring rule, the data write statement including the second quality monitoring rule.
4. A method according to any of claims 1-3, wherein the data conversion request further comprises first indication information indicating a location of the first quality-monitoring rule in the data conversion request, the method further comprising:
and reading the first quality monitoring rule from the data conversion request based on the position indicated by the first indication information.
5. An apparatus for processing data, the apparatus comprising:
a receiving unit, configured to receive a data conversion request, where the data conversion request includes identification information of a first data table, identification information of a second data table, and a first quality monitoring rule;
the processing unit is used for reading a plurality of first data stored in the first data table based on the identification information of the first data table;
the processing unit is further configured to determine first data, which accords with the first quality monitoring rule, in the plurality of first data;
the processing unit is further configured to convert the first data that accords with the first quality monitoring rule into at least one second data;
the processing unit is further configured to store the at least one second data into the second data table based on the identification information of the second data table.
6. The apparatus of claim 5, wherein the data conversion request further comprises a second quality monitoring rule, the processing unit to:
determining second data conforming to the second quality monitoring rule in the at least one second data;
and storing second data conforming to the second quality monitoring rule into the second data table based on the identification information of the second data table.
7. The apparatus of claim 6, wherein the data conversion request comprises a data read statement and a data write statement, the data read statement comprising the first quality monitoring rule and the data write statement comprising the second quality monitoring rule.
8. The apparatus of any of claims 5-7, wherein the data conversion request further comprises first indication information indicating a location of the first quality-monitoring rule in the data conversion request, the processing unit further configured to read the first quality-monitoring rule from the data conversion request based on the location indicated by the first indication information.
9. An apparatus for processing data, comprising at least one processor for coupling with a memory, reading and executing instructions in the memory to implement the method of any of claims 1-4.
10. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-4.
11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-4.
CN202211699078.8A 2022-12-28 2022-12-28 Method, device and storage medium for processing data Pending CN116049155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211699078.8A CN116049155A (en) 2022-12-28 2022-12-28 Method, device and storage medium for processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211699078.8A CN116049155A (en) 2022-12-28 2022-12-28 Method, device and storage medium for processing data

Publications (1)

Publication Number Publication Date
CN116049155A true CN116049155A (en) 2023-05-02

Family

ID=86115014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211699078.8A Pending CN116049155A (en) 2022-12-28 2022-12-28 Method, device and storage medium for processing data

Country Status (1)

Country Link
CN (1) CN116049155A (en)

Similar Documents

Publication Publication Date Title
EP3353672B1 (en) Method and apparatus for transferring data between databases
US10210105B2 (en) Inline PCI-IOV adapter
CN110162544B (en) Heterogeneous data source data acquisition method and device
CN111352902A (en) Log processing method and device, terminal equipment and storage medium
CN108287708B (en) Data processing method and device, server and computer readable storage medium
US11650754B2 (en) Data accessing method, device, and storage medium
CN111737564B (en) Information query method, device, equipment and medium
US20230030856A1 (en) Distributed table storage processing method, device and system
WO2021086693A1 (en) Management of multiple physical function non-volatile memory devices
JP7254925B2 (en) Transliteration of data records for improved data matching
CN109597697B (en) Resource matching processing method and device
CN109388651B (en) Data processing method and device
CN111694992A (en) Data processing method and device
CN108228842B (en) Docker mirror image library file storage method, terminal, device and storage medium
CN107895044B (en) Database data processing method, device and system
US20140297953A1 (en) Removable Storage Device Identity and Configuration Information
CN114816772B (en) Debugging method, debugging system and computing device for application running based on compatible layer
CN116049155A (en) Method, device and storage medium for processing data
CN108846141B (en) Offline cache loading method and device
CN113641633A (en) File processing method, file processing device, electronic equipment, medium and computer program
US10372516B2 (en) Message processing
US10528400B2 (en) Detecting deadlock in a cluster environment using big data analytics
CN115297169B (en) Data processing method, device, electronic equipment and medium
CN112261072A (en) Service calling method, device, equipment and storage medium
CN116775510B (en) Data access method, device, server and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination