WO2024109376A1 - Procédé de traitement de données, et dispositif électronique et support de stockage - Google Patents

Procédé de traitement de données, et dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2024109376A1
WO2024109376A1 PCT/CN2023/124222 CN2023124222W WO2024109376A1 WO 2024109376 A1 WO2024109376 A1 WO 2024109376A1 CN 2023124222 W CN2023124222 W CN 2023124222W WO 2024109376 A1 WO2024109376 A1 WO 2024109376A1
Authority
WO
WIPO (PCT)
Prior art keywords
fields
field
data table
blood relationship
determining
Prior art date
Application number
PCT/CN2023/124222
Other languages
English (en)
Chinese (zh)
Inventor
吴东磊
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024109376A1 publication Critical patent/WO2024109376A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation

Definitions

  • the present application belongs to the field of database technology, and specifically relates to a data processing method, electronic equipment and storage medium.
  • data lineage In the field of big data technology, data lineage describes the source and destination of data, as well as the transformation of data in multiple processing processes. It plays an important role in many business scenarios. For example, field lineage based on Structured Query Language (SQL) in a data warehouse can be applied to forward tracing of fields, backward impact range analysis, and field desensitization of sensitive data.
  • SQL Structured Query Language
  • the purpose of the embodiments of the present application is to provide a data processing method, an electronic device, and a storage medium, which can save computing and storage costs and improve data processing efficiency in different application scenarios.
  • an embodiment of the present application provides a method for data processing, the method comprising: determining a plurality of first SQL statements in a first database application scenario and a plurality of first fields in each of the first SQL statements; determining a first blood relationship between the plurality of first fields according to the first SQL statements and the plurality of first fields; determining a second field that does not match the first blood relationship in a first data table corresponding to the first database application scenario; and When reading the table, the second field is trimmed.
  • an embodiment of the present application provides a data processing device, comprising: a first determination module, used to determine multiple first SQL statements in a first database application scenario and multiple first fields in each of the first SQL statements; a second determination module, used to determine a first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields; a third determination module, used to determine a second field that does not match the first blood relationship in a first data table corresponding to the first database application scenario; and a trimming module, used to trim the second field when reading the first data table.
  • an embodiment of the present application provides an electronic device, comprising a processor and a memory, wherein the memory stores programs or instructions that can be run on the processor, and when the programs or instructions are executed by the processor, the steps of the data processing method described in the first aspect are implemented.
  • an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the data processing method described in the first aspect are implemented.
  • FIG1 is a flow chart of a data processing method provided in an embodiment of the present application.
  • FIG2 is a flow chart of a data processing method provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of a data processing device provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present application.
  • first, second, etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first”, “second”, etc. are generally of one type, and the number of objects is not limited.
  • the first object can be one or more.
  • “and/or” in the specification and claims represents at least one of the connected objects, and the character “/" generally indicates that the objects associated with each other are in an "or” relationship.
  • FIG1 shows a flow chart of a method for data processing provided by an embodiment of the present application, which can be performed by an electronic device, such as a terminal device or a server device.
  • the method can be performed by software or hardware installed in a terminal device or a server device.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc.
  • the method may include the following steps.
  • S101 Determine a plurality of first SQL statements in a first database application scenario and a plurality of first fields in each of the first SQL statements.
  • the first database application scenario may be, for example, signal coverage area query, operator network fault location, etc.
  • the corresponding SQL statement is used to read relevant required information from the data table corresponding to the first database application scenario, such as signal coverage, network fault information, etc.
  • this step may read multiple first SQL statements in the first database application scenario, and then parse each of the first SQL statements to determine multiple first fields therein.
  • S102 Determine the multiple first fields according to the first SQL statement and the multiple first fields. The first kinship relationship between fields.
  • Data lineage is to find the connection between related data in the process of data tracing, that is, the link of data generation.
  • SQL as a standardized language for accessing data, querying, updating and managing relational database systems, defines a variety of methods for operating data in the database, such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Query Language (DQL), etc.
  • DDL Data Definition Language
  • DML Data Manipulation Language
  • DQL Data Query Language
  • related processing is performed on the fields in the data table (the same data table or different data tables), such as select, group by, join, etc., thus forming the lineage relationship between different fields.
  • This step can be based on a preset SQL statement parsing tool, such as Apache Calcite component, to parse the first blood relationship between multiple first fields contained in each SQL statement.
  • a preset SQL statement parsing tool such as Apache Calcite component
  • This step is based on the blood relationship obtained in the above steps, and determines, in the first data table corresponding to the first database application scenario, a second field that does not match the first blood relationship, wherein the mismatch with the first blood relationship includes that the first blood relationship does not contain the second field, that is, the second field is a redundant field in the first database application scenario.
  • this step trims the second field when reading the first data table, thereby reducing the corresponding I/O operations in different application scenarios, saving computing and storage costs, and improving data processing efficiency.
  • a data processing method provided by an embodiment of the present application comprises determining a plurality of first SQL statements in a first database application scenario and a plurality of first fields in each of the first SQL statements; determining a first blood relationship between the plurality of first fields according to the first SQL statements and the plurality of first fields; determining a second field that does not match the first blood relationship in a first data table corresponding to the first database application scenario; and trimming the second field when reading the first data table.
  • the two fields can save computing and storage costs and improve data processing efficiency in different application scenarios.
  • the field lineage in this application can be applied to various applications on the big data platform, such as data assets, data development, data governance, data security and other fields, especially the shared data service of the data middle platform.
  • the data processing method described in the embodiments of this application is adopted to collect data on demand according to different upper-level applications, and fine-tune the optimization of system resources, thereby saving computing, storage, development and other costs and improving the efficiency of data processing.
  • determining the first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields includes: determining the first grammar of the first SQL statement; and determining the first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields when the first grammar satisfies a preset grammar condition.
  • the first blood relationship between the multiple first fields it also includes: verifying the multiple first fields, determining incorrect fields among the multiple first fields, and prompting to correct the incorrect fields among the multiple first fields.
  • the checking of the multiple first fields to determine incorrect fields among the multiple first fields so as to prompt correction of the incorrect fields among the multiple first fields includes: determining the type of the first field according to metadata information of the first data table; judging whether the type of the first field meets a preset condition; if not, prompting correction of the type of the first field.
  • the multiple first fields are verified to determine whether the multiple first fields are incorrect. field to prompt correction of incorrect fields among the multiple first fields, including: determining a first parameter of a first function corresponding to the first field, the first parameter including at least one of an input parameter and an output parameter; judging whether the type of the first parameter satisfies a preset parameter type; if not, prompting correction of the type of the first parameter.
  • the corresponding SQL and fields are correct, ensuring the generation of accurate blood relationships, thereby more accurately analyzing redundant fields and achieving precise calculation and storage effects.
  • FIG2 shows a flow chart of a method for data processing provided by an embodiment of the present application, which method can be performed by an electronic device, such as a terminal device or a server device.
  • the method can be performed by software or hardware installed in a terminal device or a server device.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc.
  • the method may include the following steps.
  • S200 Determine metadata information of a first data table corresponding to a first database application scenario based on a preset method.
  • the preset method includes reading the data definition statement DDL corresponding to the first data table or obtaining the first data table by connecting to a database
  • the database connection method is such as Java Database Connectivity (JDBC).
  • S201 Determine a plurality of first SQL statements in the first database application scenario and a plurality of first fields in each of the first SQL statements.
  • S202 Determine a first blood relationship between the multiple first fields according to the first SQL statement and the multiple first fields.
  • the step S203 includes: generating a first syntax tree according to the first SQL statement; determining a first node tree according to the first syntax tree and the plurality of first fields, wherein the leaf nodes of the first node tree include the plurality of fields, and the non-leaf nodes of the first node tree include a plurality of operators; traversing the first node tree, and determining a first operator according to the attributes of the plurality of operators; Determine the first blood relationship between the multiple first fields.
  • the operators include select, group by, join, having and other operators in SQL statements.
  • S203 Determine, according to the metadata information of the first data table and the first blood relationship, a second field in the first data table that does not match the first blood relationship.
  • This step combined with the metadata information of the first data table, can find all second fields in the first data table that do not match the first blood relationship, that is, all redundant fields.
  • the second field is trimmed to generate a trimmed and optimized SQL execution plan.
  • the database can skip the field when reading and writing at the bottom layer, thereby reducing data I/O and improving SQL execution efficiency.
  • a data processing method provided in an embodiment of the present application determines metadata information of a first data table corresponding to a first database application scenario based on a preset method, wherein the preset method includes reading a data definition statement DDL corresponding to the first data table or obtaining the first data table by connecting to a database; determining multiple first SQL statements in the first database application scenario and multiple first fields in each of the first SQL statements; determining a first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields; determining a second field in the first data table that does not match the first blood relationship based on the metadata information of the first data table and the first blood relationship; in the SQL execution plan corresponding to the first data table, trimming the second field to generate a trimmed and optimized SQL execution plan, which can save computing and storage costs and improve data processing efficiency in different application scenarios.
  • the data processing method provided in the embodiment of the present application can be executed by a data processing device or a control module in the data processing device for executing the data processing method.
  • the data processing device provided in the embodiment of the present application is described by taking the data processing method executed by the data processing device as an example.
  • FIG3 is a schematic diagram of the structure of a data processing device provided in an embodiment of the present application.
  • the data processing device 300 includes: a first determination module 310 , a second determination module 320 , a third determination module 330 and a clipping module 340 .
  • the first determination module 310 is used to determine multiple first SQL statements in the first database application scenario and multiple first fields in each of the first SQL statements; the second determination module 320 is used to determine the first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields; the third determination module 330 is used to determine the second field that does not match the first blood relationship in the first data table corresponding to the first database application scenario; the trimming module 340 is used to trim the second field when reading the first data table.
  • the second determination module 320 is used to: determine a first grammar of the first SQL statement; and determine a first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields when the first grammar meets a preset grammar condition.
  • the second determination module 320 is used to: generate a first syntax tree according to the first SQL statement; determine a first node tree according to the first syntax tree and the multiple first fields, the leaf nodes of the first node tree include the multiple fields, and the non-leaf nodes of the first node tree include multiple operators; traverse the first node tree, and determine the first blood relationship between the multiple first fields according to the attributes of the multiple operators.
  • the trimming module 340 is used to trim the second field from the SQL execution plan corresponding to the first data table, and generate a trimmed and optimized SQL execution plan.
  • the device 300 also includes: an acquisition module, used to: determine the metadata information of the first data table based on a preset method, wherein the preset method includes reading the data definition statement DDL corresponding to the first data table or obtaining the first data table by connecting to a database; the third determination module 330 is used to: determine the second field in the first data table that does not match the first blood relationship based on the metadata information of the first data table and the first blood relationship.
  • an acquisition module used to: determine the metadata information of the first data table based on a preset method, wherein the preset method includes reading the data definition statement DDL corresponding to the first data table or obtaining the first data table by connecting to a database
  • the third determination module 330 is used to: determine the second field in the first data table that does not match the first blood relationship based on the metadata information of the first data table and the first blood relationship.
  • the device 300 further includes: a verification module, configured to verify the plurality of first fields, determine incorrect fields among the plurality of first fields, and prompt correction of the incorrect fields among the plurality of first fields.
  • a verification module configured to verify the plurality of first fields, determine incorrect fields among the plurality of first fields, and prompt correction of the incorrect fields among the plurality of first fields.
  • the verification module is used to: determine the type of the first field according to the metadata information of the first data table; determine whether the type of the first field meets a preset condition; if not, prompt to correct the type of the first field.
  • the verification module is used to: determine a first parameter of a first function corresponding to the first field, the first parameter including at least one of an input parameter and an output parameter; determine whether a type of the first parameter satisfies a preset parameter type; if not, prompt to correct the type of the first parameter.
  • An embodiment of the present application provides a data processing device, which uses a first determination module to determine multiple first SQL statements in a first database application scenario and multiple first fields in each of the first SQL statements; a second determination module to determine a first blood relationship between the multiple first fields based on the first SQL statement and the multiple first fields; a third determination module to determine, in a first data table corresponding to the first database application scenario, a second field that does not match the first blood relationship; and a trimming module to trim the second field when reading the first data table, which can save computing and storage costs and improve data processing efficiency in different application scenarios.
  • the data processing device provided in the embodiment of the present application can implement the various processes implemented by the data processing method embodiment described in Figures 1 to 2, and achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the data processing device in the embodiments of the present application may be a device, or a component, integrated circuit, or chip in a terminal device.
  • the device may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a laptop computer, a PDA, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (PDA).
  • the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a television (television, TV), an ATM or an ATM, etc., which is not specifically limited in the embodiments of the present application.
  • the data processing device in the embodiment of the present application may be a device having an operating system.
  • the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.
  • an embodiment of the present application further provides an electronic device 400, including a processor 401, a memory 402, a program or instruction stored in the memory 402 and executable on the processor 401, and when the program or instruction is executed by the processor 401, the method of data processing described in at least one of the embodiments of FIG1 and FIG2 is implemented.
  • the electronic device in the embodiment of the present application includes: a server, a terminal device, or other devices other than a terminal device.
  • the above electronic device structure does not constitute a limitation on the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange the components differently.
  • the input unit may include a graphics processing unit (GPU) and a microphone
  • the display unit may be configured with a display panel in the form of a liquid crystal display, an organic light-emitting diode, etc.
  • the user input unit includes a touch panel and at least one of other input devices.
  • the touch panel is also called a touch screen.
  • Other input devices may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
  • the memory can be used to store software programs and various data.
  • the memory may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the memory may include a volatile memory or a non-volatile memory, or the memory may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory can be Random Access Memory (RAM), Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM) and Direct Rambus RAM (DRRAM).
  • RAM Random Access Memory
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDRSDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Link DRAM
  • DRRAM Direct Rambus RAM
  • the processor may include one or more processing units; optionally, the processor integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, and the modem processor mainly processes communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor.
  • An embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored.
  • a program or instruction is stored.
  • the program or instruction is executed by a processor, the method of data processing described in at least one of the embodiments of Figures 1 and 2 is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
  • the processor is a processor in the electronic device described in the above embodiment.
  • the readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.
  • An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned data processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
  • the technical solution of the present application can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a storage medium such as ROM/RAM, a magnetic disk, or an optical disk
  • a terminal which can be a mobile phone, a computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente demande appartient au domaine technique des bases de données. Un procédé de traitement de données, un dispositif électronique et un support de stockage sont divulgués. Le procédé consiste à : déterminer une pluralité de premières instructions SQL dans un premier scénario d'application de base de données et une pluralité de premiers champs dans chacune des premières instructions SQL ; déterminer un premier lien de sang entre la pluralité de premiers champs selon les premières instructions SQL et la pluralité de premiers champs ; déterminer, à partir d'une première table de données correspondant au premier scénario d'application de base de données, un second champ qui ne correspond pas au premier lien de sang ; et lorsque la première table de données est lue, couper le second champ.
PCT/CN2023/124222 2022-11-22 2023-10-12 Procédé de traitement de données, et dispositif électronique et support de stockage WO2024109376A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211467472.9 2022-11-22
CN202211467472.9A CN118069676A (zh) 2022-11-22 2022-11-22 一种数据处理的方法、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024109376A1 true WO2024109376A1 (fr) 2024-05-30

Family

ID=91099586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/124222 WO2024109376A1 (fr) 2022-11-22 2023-10-12 Procédé de traitement de données, et dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN118069676A (fr)
WO (1) WO2024109376A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180165337A1 (en) * 2016-12-13 2018-06-14 Ca, Inc. System for Extracting Data from a Database in a User Selected Format and Related Methods and Computer Program Products
CN110633333A (zh) * 2019-09-25 2019-12-31 京东数字科技控股有限公司 数据血缘关系的处理方法及系统、计算设备和介质
CN113961584A (zh) * 2021-10-20 2022-01-21 平安银行股份有限公司 字段血缘分析方法、装置、电子设备及存储介质
CN114265945A (zh) * 2021-12-30 2022-04-01 多点生活(武汉)科技有限公司 血缘关系提取方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180165337A1 (en) * 2016-12-13 2018-06-14 Ca, Inc. System for Extracting Data from a Database in a User Selected Format and Related Methods and Computer Program Products
CN110633333A (zh) * 2019-09-25 2019-12-31 京东数字科技控股有限公司 数据血缘关系的处理方法及系统、计算设备和介质
CN113961584A (zh) * 2021-10-20 2022-01-21 平安银行股份有限公司 字段血缘分析方法、装置、电子设备及存储介质
CN114265945A (zh) * 2021-12-30 2022-04-01 多点生活(武汉)科技有限公司 血缘关系提取方法、装置及电子设备

Also Published As

Publication number Publication date
CN118069676A (zh) 2024-05-24

Similar Documents

Publication Publication Date Title
WO2020233330A1 (fr) Procédé de test par lots, appareil, et support de stockage lisible par ordinateur
US11620306B2 (en) Low-latency predictive database analysis
JP2020522790A (ja) 異種にプログラムされたデータ処理システムの自動依存性アナライザ
US11941034B2 (en) Conversational database analysis
CN115374759B (zh) 在线文档编辑区域定位方法、装置、服务器及存储介质
CN108563694B (zh) 对逻辑删除的sql执行方法、装置、计算机设备和存储介质
US11586620B2 (en) Object scriptability
US10353879B2 (en) Database catalog with metadata extensions
US20120030192A1 (en) Apparatus for processing materialized tables in a multi-tenant application system
CN115328569B (zh) 处理数据冲突的方法、系统、电子设备及计算机可读存储介质
CN113760947A (zh) 一种数据中台、数据处理方法、装置、设备及存储介质
CN113901083A (zh) 基于多解析器的异构数据源操作资源解析定位方法和设备
WO2021013057A1 (fr) Procédé et appareil de gestion de données et dispositif et support de stockage lisible par ordinateur
US20190147088A1 (en) Reporting and data governance management
CN113703777A (zh) 基于数据库表的代码生成方法、装置、存储介质和设备
CN111444208B (zh) 一种数据更新方法及相关设备
WO2024109376A1 (fr) Procédé de traitement de données, et dispositif électronique et support de stockage
US10318524B2 (en) Reporting and data governance management
CN113868138A (zh) 测试数据的获取方法、系统、设备及存储介质
CN112416966A (zh) 即席查询方法、装置、计算机设备和存储介质
CN115952203B (zh) 数据查询方法、设备、系统及存储介质
CN111723104A (zh) 一种数据处理系统中语法分析的方法、装置及系统
US11704094B2 (en) Data integrity analysis tool
CN110795451B (zh) Sql指纹还原方法、装置、计算机设备和存储介质
US12038824B2 (en) Record-replay testing framework with machine learning based assertions