CN108536745B - Shell-based data table extraction method, terminal, equipment and storage medium - Google Patents

Shell-based data table extraction method, terminal, equipment and storage medium Download PDF

Info

Publication number
CN108536745B
CN108536745B CN201810196485.4A CN201810196485A CN108536745B CN 108536745 B CN108536745 B CN 108536745B CN 201810196485 A CN201810196485 A CN 201810196485A CN 108536745 B CN108536745 B CN 108536745B
Authority
CN
China
Prior art keywords
data
data table
name
data information
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810196485.4A
Other languages
Chinese (zh)
Other versions
CN108536745A (en
Inventor
林林
戴建明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to PCT/CN2018/101880 priority Critical patent/WO2019161645A1/en
Publication of CN108536745A publication Critical patent/CN108536745A/en
Application granted granted Critical
Publication of CN108536745B publication Critical patent/CN108536745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data table extraction method, a terminal, equipment and a storage medium based on Shell, wherein the method comprises the following steps: identifying a data table in the Shell script; extracting the table name of the data table; classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table; and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document. According to the invention, through the improved data table extraction method, the data table related to each script does not need to be searched in a complicated way, the sorting and updating processes are simplified to the greatest extent, and a large amount of human resources can be saved.

Description

Shell-based data table extraction method, terminal, equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a data table extraction method, a terminal, equipment and a storage medium based on Shell.
Background
Shell is a free programming language that enables automatic and interactive tasks to communicate without human intervention. The script can be created by using the program to provide input for a command or a program, and the Shell can simulate the input required by the program according to the prompt of the program to realize the execution of the interactive program.
In the application of the existing Shell scripts, the Shell scripts often relate to more data tables, and if each Shell script is manually sorted to obtain the data tables in the Shell scripts, the extraction process is very time-consuming and the workload is very large; in addition, statements of the data table in the Shell script change along with modification of the application version, if the information is sorted and updated manually, a large amount of labor is consumed, and the sorted data table is easy to generate errors.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data table extraction method, a terminal, a device, and a storage medium based on Shell, which can simplify the sorting and updating process to the greatest extent and save a large amount of human resources.
In one aspect, an embodiment of the present invention provides a data table extraction method based on Shell, including:
identifying a data table in the Shell script;
extracting the table name of the data table;
classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document.
In another aspect, an embodiment of the present invention provides a data table extraction terminal based on Shell, where the terminal includes:
the identification unit is used for identifying a data table in the Shell script;
the extracting unit is used for extracting the table name of the data table;
the classification unit is used for classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
and the acquisition unit is used for acquiring the data information corresponding to the data tables of different types and outputting the acquired data information of different types to the same preset document.
In another aspect, an embodiment of the present invention further provides a data table extracting apparatus based on Shell, including:
a memory for storing a program for implementing the data table extraction method; and
a processor for executing a program stored in the memory for implementing a data table extraction method to perform the method as described above.
In yet another aspect, embodiments of the present invention also provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the method described above.
The embodiment of the invention identifies the data table in the Shell script; extracting the table name of the data table; classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table; and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document. According to the embodiment of the invention, through the improved data table extraction method, the data table related to each script does not need to be searched in a complicated way, the sorting and updating process is simplified to the maximum extent, and a large amount of human resources can be saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data table extraction method based on Shell according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a Shell-based data table extraction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a Shell-based data table extraction method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a Shell-based data table extraction method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of a Shell-based data table extraction method according to another embodiment of the present invention;
fig. 6 is a schematic block diagram of a data table extraction terminal based on Shell according to an embodiment of the present invention;
fig. 7 is another schematic block diagram of a Shell-based data table extraction terminal according to an embodiment of the present invention;
fig. 8 is another schematic block diagram of a Shell-based data table extraction terminal according to an embodiment of the present invention;
fig. 9 is another schematic block diagram of a Shell-based data table extraction terminal according to an embodiment of the present invention;
fig. 10 is another schematic block diagram of a Shell-based data table extraction terminal according to an embodiment of the present invention;
fig. 11 is a schematic structural component diagram of a data table extraction device based on Shell according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data table extraction method based on Shell according to an embodiment of the present invention. The method can be operated in terminals such as smart phones (such as Android phones, IOS phones and the like), tablet computers, notebook computers, smart devices and the like. The data dimension generation method of the embodiment of the invention does not need to search the data table related to each script in a complicated way, simplifies the sorting and updating process to the maximum extent, and can save a large amount of human resources. Fig. 1 is a schematic flow chart of a Shell-based data table extraction method according to an embodiment of the present invention. The method includes steps S101 to S104.
S101, identifying a data table in the Shell script.
In the embodiment of the invention, the data table refers to a related data table which is called from a database by connecting the database through SQL statements in the Shell script; the database is connected in the Shell script and the data table is called to acquire data in the database, and the purpose of monitoring certain information in the database can be achieved by acquiring the data in the database in daily operation and maintenance work, so that the performance of the equipment can be further known in real time.
Identifying a data table in the Shell script can be realized by identifying keywords in an SQL statement, for example, by identifying an insertion statement "insert intro", so as to identify the data table following the insertion statement; the data table following the query statement can be identified by identifying the query statement "select from"; the data table followed by the update statement can also be identified by identifying the update statement "update"; it is also possible to identify a data table or the like following a deletion statement by identifying the deletion statement "delete from".
And S102, extracting the table name of the data table.
In the embodiment of the invention, after the data table in the Shell script is identified, the table name of the identified data table is extracted, for example, in the insertion statement "insert inter { TABLENAME }", the extracted table name of the data table is "TABLENAME"; in the query statement "select x from { USERNAME }", the table name of the extracted data table is "USERNAME", the table name of the data table extracted in the update statement "update { DBNAME }" is "DBNAME", and the table name of the data table extracted in the delete statement "delete from { KBNAME }" is "KBNAME".
S103, classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table.
In the embodiment of the present invention, after extracting the table name of the data table through the keywords of a series of SQL statements, the table name of the data table is stored in a temporary file, the type of the data table includes a source table and a target table, wherein the method for classifying the type of the data table according to the table name of the data table may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like.
Further, as shown in fig. 2, the step S103 includes steps S201 to S202.
S201, determining a character string corresponding to the data table name.
In the embodiment of the present invention, the character string refers to a string of characters corresponding to a table name of the data table, and since the table name of the data table may be composed of numbers, letters, and underlines, the character string may also be composed of numbers, letters, and underlines.
S202, classifying the data table according to the character string.
In the embodiment of the present invention, the data table is classified according to the character string, and the classification method is related to the SQL statement keyword in front of the table name of the data table, and the classification method may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like. By sorting the data tables by means of the character strings, it is possible to exclude disturbances of invalid data tables to valid data tables, for example from _ unixtime, as if such a character following from would not belong to the specified content would be considered as an invalid data table.
And S104, acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document.
In the embodiment of the invention, the data table is a JOB association table of the Hadoop, the JOB association table of the Hadoop is written by using a Hadoop statement and an SQL statement and is stored in a corresponding database, the table name of the JOB association table of the Hadoop is written into a corresponding Shell script, and when the JOB association table of the Hadoop needs to be identified, the table name in the Shell script is firstly extracted, namely, which JOB association tables are involved in the script is identified, and the types of the JOB association tables belong to a source table or a target table.
It should be noted that the source table refers to a table inside the Hadoop and a table of an external relational database, spaces or line feeds are arranged before and after a character string of the source table, and from keywords are followed by a table name; the target table is divided into an insert target table and an overlay target table by a writing mode, such as an insert target table a, an insert over write target table b, and an overlay target table, where the preset document may be a data table in a preset database, for example, capturing keywords and related contents from a script of a JOB association table, and recording the captured contents into a temporary file, where the captured contents are completed at the hdfs level of Hadoop, and then loading the result in the temporary file into a Hive table of Hadoop, and the data in the Hive table outputs data information to a specified preset Oracle database by a Sqoop mode, and in particular, outputs the data information to a pre-established data table in the preset Oracle database. Optionally, the user may form a pre-established data table storing data information into an Oracle Pkg (Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized.
Further, as shown in fig. 3, if the data table is a source table, the step S104 includes steps S301 to S303.
S301, dividing the source table into an internal source table and an external source table.
In embodiments of the present invention, the internal source table refers to a table inside a Hadoop (e.g., Hive table of Hadoop), and the external source table refers to a table of an external relational database.
S302, acquiring data information corresponding to the internal source table and the external source table.
In the embodiment of the present invention, the data information includes table information, field information, and the like, where the table information may be a table name, a table type, and the like, and the field information may be a field name, a field type, and the like.
And S303, outputting the acquired data information to a preset document.
In the embodiment of the present invention, the preset document may be a preset data table in a preset Oracle database, specifically, the acquired data information may be output to the preset data table in the preset Oracle database, and a user may combine the preset data table in which the data information is stored with an Oracle Pkg (Oracle packaging, Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized, so that the preset data table established in the preset Oracle database may be optimized.
Further, as shown in fig. 4, if the data table is a target table, the step S104 includes steps S401 to S403.
S401, dividing the target table into an insertion target table and an overlay target table.
In an embodiment of the present invention, the type of the target table, such as the table tableA in the SQL statement "insert intertable a", is determined by the SQL statement key before the table name of the data table, such as the target table inserted after the intero, and the type of the target table, such as the table tableB in the SQL statement "insert over write table b", is determined by the SQL statement key before the table name of the data table, such as the target table overlay followed by the target table.
S402, acquiring data information corresponding to the insertion target table and the coverage target table.
In the embodiment of the present invention, the data information includes table information, field information, and the like, where the table information may be a table name, a table type, and the like, and the field information may be a field name, a field type, and the like.
And S403, outputting the acquired data information to a preset document.
In the embodiment of the present invention, the preset document may be a preset data table in a preset Oracle database, specifically, the acquired data information may be output to the preset data table in the preset Oracle database, and a user may combine the preset data table in which the data information is stored with an Oracle Pkg (Oracle packaging, Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized, so that the preset data table established in the preset Oracle database may be optimized.
As can be seen from the above, the embodiment of the present invention identifies the data table in the Shell script; extracting the table name of the data table; classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table; and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document. According to the embodiment of the invention, through the improved data table extraction method, the data table related to each script does not need to be searched in a complicated way, the sorting and updating process is simplified to the maximum extent, and a large amount of human resources can be saved.
Referring to fig. 5, fig. 5 is a schematic flowchart of a Shell-based data table extraction method according to an embodiment of the present invention. The method can be operated in terminals such as smart phones (such as Android phones, IOS phones and the like), tablet computers, notebook computers, smart devices and the like. As shown in fig. 5, the method includes steps S501 to S506.
S501, traversing the Shell script according to preset keywords.
In the embodiment of the invention, when the Shell script is traversed, the rule with short creation time of the traversal data table is adopted, and then the rule with long creation time of the traversal data table is adopted for traversal, so that the traversal of the data table of the Shell script is realized, the traversal rule from short creation time to long creation time is carried out on the Shell script, and the efficiency of processing the Shell script can be improved.
And S502, positioning a data table in the Shell script according to the traversal result.
In the embodiment of the invention, the traversal result of the Shell script is utilized to display the position of the data table in the Shell script, and the data table in the Shell script is positioned according to the position information of the displayed data table.
S503, identifying a data table in the Shell script.
In the embodiment of the invention, the data table refers to a related data table which is called from a database by connecting the database through SQL statements in the Shell script; the database is connected in the Shell script and the data table is called to acquire data in the database, and the purpose of monitoring certain information in the database can be achieved by acquiring the data in the database in daily operation and maintenance work, so that the performance of the equipment can be further known in real time.
Identifying a data table in the Shell script can be realized by identifying keywords in an SQL statement, for example, by identifying an insertion statement "insert intro", so as to identify the data table following the insertion statement; the data table following the query statement can be identified by identifying the query statement "select from"; the data table followed by the update statement can also be identified by identifying the update statement "update"; it is also possible to identify a data table or the like following a deletion statement by identifying the deletion statement "delete from".
S504, extracting the table name of the data table.
In the embodiment of the invention, after the data table in the Shell script is identified, the table name of the identified data table is extracted, for example, in the insertion statement "insert inter { TABLENAME }", the extracted table name of the data table is "TABLENAME"; in the query statement "select x from { USERNAME }", the table name of the extracted data table is "USERNAME", the table name of the data table extracted in the update statement "update { DBNAME }" is "DBNAME", and the table name of the data table extracted in the delete statement "delete from { KBNAME }" is "KBNAME".
And S505, classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table.
In the embodiment of the present invention, after extracting the table name of the data table through the keywords of a series of SQL statements, the table name of the data table is stored in a temporary file, the type of the data table includes a source table and a target table, wherein the method for classifying the type of the data table according to the table name of the data table may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like.
S506, acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document.
In the embodiment of the invention, the data table is a JOB association table of the Hadoop, the JOB association table of the Hadoop is written by using a Hadoop statement and an SQL statement and is stored in a corresponding database, the table name of the JOB association table of the Hadoop is written into a corresponding Shell script, and when the JOB association table of the Hadoop needs to be identified, the table name in the Shell script is firstly extracted, namely, which JOB association tables are involved in the script is identified, and the types of the JOB association tables belong to a source table or a target table.
It should be noted that the source table refers to a table inside the Hadoop and a table of an external relational database, spaces or line feeds are arranged before and after a character string of the source table, and from keywords are followed by a table name; the target table is divided into an insert target table and an overlay target table by a writing mode, such as an insert target table a, an insert over write target table b, and an overlay target table, where the preset document may be a data table in a preset database, for example, capturing keywords and related contents from a script of a JOB association table, and recording the captured contents into a temporary file, where the captured contents are completed at the hdfs level of Hadoop, and then loading the result in the temporary file into a Hive table of Hadoop, and the data in the Hive table outputs data information to a specified preset Oracle database by a Sqoop mode, and in particular, outputs the data information to a pre-established data table in the preset Oracle database. Optionally, the user may form a pre-established data table storing the data information into an Oracle Pkg (Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized.
Referring to fig. 6, in response to the above method for extracting a data table based on a Shell, an embodiment of the present invention further provides a terminal for extracting a data table based on a Shell, where the terminal 100 includes: identification unit 101, extraction unit 102, classification unit 103, and acquisition unit 104.
The identifying unit 101 is configured to identify a data table in the Shell script. In the embodiment of the invention, the data table refers to a related data table which is called from a database by connecting the database through SQL statements in the Shell script; the database is connected in the Shell script and the data table is called to acquire data in the database, and the purpose of monitoring certain information in the database can be achieved by acquiring the data in the database in daily operation and maintenance work, so that the performance of the equipment can be further known in real time.
Identifying a data table in the Shell script can be realized by identifying keywords in an SQL statement, for example, by identifying an insertion statement "insert intro", so as to identify the data table following the insertion statement; the data table following the query statement can be identified by identifying the query statement "select from"; the data table followed by the update statement can also be identified by identifying the update statement "update"; it is also possible to identify a data table or the like following a deletion statement by identifying the deletion statement "delete from".
An extracting unit 102, configured to extract a table name of the data table. In the embodiment of the invention, after the data table in the Shell script is identified, the table name of the identified data table is extracted, for example, in the insertion statement "insert inter { TABLENAME }", the extracted table name of the data table is "TABLENAME"; in the query statement "select x from { USERNAME }", the table name of the extracted data table is "USERNAME", the table name of the data table extracted in the update statement "update { DBNAME }" is "DBNAME", and the table name of the data table extracted in the delete statement "delete from { KBNAME }" is "KBNAME".
A classifying unit 103, configured to classify the data table according to the extracted table name, where the data table includes a source table and a target table. In the embodiment of the present invention, after extracting the table name of the data table through the keywords of a series of SQL statements, the table name of the data table is stored in a temporary file, the type of the data table includes a source table and a target table, wherein the method for classifying the type of the data table according to the table name of the data table may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like.
The obtaining unit 104 is configured to obtain data information corresponding to different types of data tables, and output the obtained different types of data information to the same preset document. In the embodiment of the invention, the data table is a JOB association table of the Hadoop, the JOB association table of the Hadoop is written by using a Hadoop statement and an SQL statement and is stored in a corresponding database, the table name of the JOB association table of the Hadoop is written into a corresponding Shell script, and when the JOB association table of the Hadoop needs to be identified, the table name in the Shell script is firstly extracted, namely, which JOB association tables are involved in the script is identified, and the types of the JOB association tables belong to a source table or a target table.
It should be noted that the source table refers to a table inside the Hadoop and a table of an external relational database, spaces or line feeds are arranged before and after a character string of the source table, and from keywords are followed by a table name; the target table is divided into an insert target table and an overlay target table by a writing mode, such as an insert target table a, an insert over write target table b, and an overlay target table, where the preset document may be a data table in a preset database, for example, capturing keywords and related contents from a script of a JOB association table, and recording the captured contents into a temporary file, where the captured contents are completed at the hdfs level of Hadoop, and then loading the result in the temporary file into a Hive table of Hadoop, and the data in the Hive table outputs data information to a specified preset Oracle database by a Sqoop mode, and in particular, outputs the data information to a pre-established data table in the preset Oracle database. Optionally, the user may form a pre-established data table storing the data information into an Oracle Pkg (Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized.
As can be seen from the above, the embodiment of the present invention identifies the data table in the Shell script; extracting the table name of the data table; classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table; and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document. According to the embodiment of the invention, through the improved data table extraction method, the data table related to each script does not need to be searched in a complicated way, the sorting and updating process is simplified to the maximum extent, and a large amount of human resources can be saved.
As shown in fig. 7, the classification unit 103 includes:
a determining unit 1031, configured to determine a character string corresponding to the data table name. In the embodiment of the present invention, the character string refers to a string of characters corresponding to a table name of the data table, and since the table name of the data table may be composed of numbers, letters, and underlines, the character string may also be composed of numbers, letters, and underlines.
A classification subunit 1032, configured to classify the data table according to the character string. In the embodiment of the present invention, the data table is classified according to the character string, and the classification method is related to the SQL statement keyword in front of the table name of the data table, and the classification method may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like. By sorting the data tables by means of the character strings, it is possible to exclude disturbances of invalid data tables to valid data tables, for example from _ unixtime, as if such a character following from would not belong to the specified content would be considered as an invalid data table.
As shown in fig. 8, if the data table is a source table, the obtaining unit 104 includes:
a first execution unit 1041, configured to divide the source table into an internal source table and an external source table. In embodiments of the present invention, the internal source table refers to a table inside a Hadoop (e.g., Hive table of Hadoop), and the external source table refers to a table of an external relational database.
The first obtaining subunit 1042 is configured to obtain data information corresponding to the internal source table and the external source table. In the embodiment of the present invention, the data information includes table information, field information, and the like, where the table information may be a table name, a table type, and the like, and the field information may be a field name, a field type, and the like.
A first output unit 1043, configured to output the acquired data information to a preset document. The preset document can be a preset data table in a preset Oracle database, specifically, the acquired data information can be output to the preset data table in the preset Oracle database, and a user can store the preset data table with the data information and form an Oracle Pkg (Oracle packaging, Oracle packaging file).
As shown in fig. 9, if the data table is a target table, the obtaining unit 104 includes:
a second execution unit 1044, configured to divide the target table into an insertion target table and an overlay target table. In an embodiment of the present invention, the type of the target table, such as the table tableA in the SQL statement "insert intertable a", is determined by the SQL statement key before the table name of the data table, such as the target table inserted after the intero, and the type of the target table, such as the table tableB in the SQL statement "insert over write table b", is determined by the SQL statement key before the table name of the data table, such as the target table overlay followed by the target table.
A second obtaining subunit 1045, configured to obtain data information corresponding to the insertion target table and the coverage target table. In the embodiment of the present invention, the data information includes table information, field information, and the like, where the table information may be a table name, a table type, and the like, and the field information may be a field name, a field type, and the like.
A second output unit 1046, configured to output the acquired data information to a preset document. In the embodiment of the invention, data information is output to a specified Oracle database in a Sqoop mode, and an Oracle Pkg is formed, and if the data table needs to be optimized, only the Pkg of the Oracle needs to be optimized.
Referring to fig. 10, in response to the above method for extracting a data table based on Shell, an embodiment of the present invention further provides a terminal for extracting a data table based on Shell, where the terminal 200 includes: the system comprises a traversing unit 201, a positioning unit 202, an identifying unit 203, an extracting unit 204, a classifying unit 205 and an acquiring unit 206.
The traversal unit 201 is configured to traverse the Shell script according to a preset keyword. In the embodiment of the invention, when the Shell script is traversed, the rule with short creation time of the traversal data table is adopted, and then the rule with long creation time of the traversal data table is adopted for traversal, so that the traversal of the data table of the Shell script is realized, the traversal rule from short creation time to long creation time is carried out on the Shell script, and the efficiency of processing the Shell script can be improved.
And the positioning unit 202 is configured to position the data table in the Shell script according to the traversal result. In the embodiment of the invention, the traversal result of the Shell script is utilized to display the position of the data table in the Shell script, and the data table in the Shell script is positioned according to the position information of the displayed data table.
And the identification unit 203 is used for identifying the data table in the Shell script. In the embodiment of the invention, the data table refers to a related data table which is called from a database by connecting the database through SQL statements in the Shell script; the database is connected in the Shell script and the data table is called to acquire data in the database, and the purpose of monitoring certain information in the database can be achieved by acquiring the data in the database in daily operation and maintenance work, so that the performance of the equipment can be further known in real time.
Identifying a data table in the Shell script can be realized by identifying keywords in an SQL statement, for example, by identifying an insertion statement "insert intro", so as to identify the data table following the insertion statement; the data table following the query statement can be identified by identifying the query statement "select from"; the data table followed by the update statement can also be identified by identifying the update statement "update"; it is also possible to identify a data table or the like following a deletion statement by identifying the deletion statement "delete from".
An extracting unit 204, configured to extract a table name of the data table. In the embodiment of the invention, after the data table in the Shell script is identified, the table name of the identified data table is extracted, for example, in the insertion statement "insert inter { TABLENAME }", the extracted table name of the data table is "TABLENAME"; in the query statement "select x from { USERNAME }", the table name of the extracted data table is "USERNAME", the table name of the data table extracted in the update statement "update { DBNAME }" is "DBNAME", and the table name of the data table extracted in the delete statement "delete from { KBNAME }" is "KBNAME".
A classifying unit 205, configured to classify the data table according to the extracted table name, where the data table includes a source table and a target table. In the embodiment of the present invention, after extracting the table name of the data table through the keywords of a series of SQL statements, the table name of the data table is stored in a temporary file, the type of the data table includes a source table and a target table, wherein the method for classifying the type of the data table according to the table name of the data table may be: if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a from keyword is followed before the table name, the type of the data table is a source table, if the table name of the data table is an independent character string, a space or a line feed is arranged before and after the independent character string, and a non-from keyword is followed before the table name, the type of the data table is a target table, and optionally, the non-from keyword can be an SQL sentence keyword such as into, update and the like.
The obtaining unit 206 is configured to obtain data information corresponding to different types of data tables, and output the obtained different types of data information to the same preset document. In the embodiment of the invention, the data table is a JOB association table of the Hadoop, the JOB association table of the Hadoop is written by using a Hadoop statement and an SQL statement and is stored in a corresponding database, the table name of the JOB association table of the Hadoop is written into a corresponding Shell script, and when the JOB association table of the Hadoop needs to be identified, the table name in the Shell script is firstly extracted, namely, which JOB association tables are involved in the script is identified, and the types of the JOB association tables belong to a source table or a target table.
It should be noted that the source table refers to a table inside the Hadoop and a table of an external relational database, spaces or line feeds are arranged before and after a character string of the source table, and from keywords are followed by a table name; the target table is divided into an insert target table and an overlay target table by a writing mode, such as an insert target table a, an insert over write target table b, and an overlay target table, where the preset document may be a data table in a preset database, for example, capturing keywords and related contents from a script of a JOB association table, and recording the captured contents into a temporary file, where the captured contents are completed at the hdfs level of Hadoop, and then loading the result in the temporary file into a Hive table of Hadoop, and the data in the Hive table outputs data information to a specified preset Oracle database by a Sqoop mode, and in particular, outputs the data information to a pre-established data table in the preset Oracle database. Optionally, the user may form a pre-established data table storing the data information into an Oracle Pkg (Oracle packaging file), and if the data table needs to be optimized, only the Oracle Pkg needs to be optimized.
In terms of hardware implementation, the above identifying unit 101, the extracting unit 102, the classifying unit 103, the obtaining unit 104, etc. may be embedded in a hardware form or in a device independent of data processing, or may be stored in a memory of the data processing device in a software form, so that the processor calls to execute operations corresponding to the above units. The processor can be a Central Processing Unit (CPU), a microprocessor, a singlechip and the like.
The above-mentioned Shell-based data table extraction terminal may be implemented in the form of a computer program, which may be run on a computer device as shown in fig. 11.
Fig. 11 is a schematic structural composition diagram of a data table extraction device based on Shell according to the present invention. The device can be a terminal or a server, wherein the terminal can be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers. Referring to fig. 11, the computer apparatus 500 includes a processor 502, a nonvolatile storage medium 503, an internal memory 504, and a network interface 505, which are connected by a system bus 501. The non-volatile storage medium 503 of the computer device 500 may store, among other things, an operating system 5031 and a computer program 5032, which, when executed, may cause the processor 502 to perform a Shell-based data table extraction method. The processor 502 of the computer device 500 is used to provide computing and control capabilities that support the overall operation of the computer device 500. The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which when executed by the processor causes the processor 502 to perform a Shell-based data table extraction method. The network interface 505 of the computer device 500 is used for network communication such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 performs the following operations:
identifying a data table in the Shell script;
extracting the table name of the data table;
classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document.
In one embodiment, the processor 502 further performs the following operations:
traversing the Shell script according to preset keywords;
and positioning a data table in the Shell script according to the traversal result.
In one embodiment, the sorting the data table according to the extracted table name includes:
determining a character string corresponding to the data table name;
and classifying the data table according to the character string.
In an embodiment, if the data table is a source table, the acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document includes:
dividing the source table into an internal source table and an external source table;
acquiring data information corresponding to the internal source table and the external source table;
and outputting the acquired data information to a preset document.
In an embodiment, if the data table is a target table, the acquiring data information corresponding to the data tables of different types and outputting the acquired data information of different types to the same preset document includes:
dividing the target table into an insertion target table and an overlay target table;
acquiring data information corresponding to the insertion target table and the coverage target table;
and outputting the acquired data information to a preset document.
Those skilled in the art will appreciate that the embodiment of the Shell-based data table extraction facility shown in FIG. 11 does not constitute a limitation on the specific construction of the Shell-based data table extraction facility, and in other embodiments, the Shell-based data table extraction facility may include more or fewer components than shown, or combine certain components, or a different arrangement of components. For example, in some embodiments, the Shell-based data table extraction device only includes a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 11, and are not described herein again.
The present invention provides a computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of:
identifying a data table in the Shell script;
extracting the table name of the data table;
classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
and acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document.
In one embodiment, the following steps are also implemented:
traversing the Shell script according to preset keywords;
and positioning a data table in the Shell script according to the traversal result.
In one embodiment, the sorting the data table according to the extracted table name includes:
determining a character string corresponding to the data table name;
and classifying the data table according to the character string.
In an embodiment, if the data table is a source table, the acquiring data information corresponding to the data tables of different types, and outputting the acquired data information of different types to the same preset document includes:
dividing the source table into an internal source table and an external source table;
acquiring data information corresponding to the internal source table and the external source table;
and outputting the acquired data information to a preset document.
In an embodiment, if the data table is a target table, the acquiring data information corresponding to the data tables of different types and outputting the acquired data information of different types to the same preset document includes:
dividing the target table into an insertion target table and an overlay target table;
acquiring data information corresponding to the insertion target table and the coverage target table;
and outputting the acquired data information to a preset document.
The foregoing storage medium of the present invention includes: various media that can store program codes, such as a magnetic disk, an optical disk, and a Read-Only Memory (ROM).
The elements of all embodiments of the present invention may be implemented by a general purpose Integrated Circuit, such as a CPU (Central Processing Unit), or by an ASIC (Application Specific Integrated Circuit).
The steps in the data table extraction method based on the Shell in the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs.
The data table extraction terminal based on the Shell in the embodiment of the invention can merge, divide and delete the units according to actual needs.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A data table extraction method based on Shell is characterized by comprising the following steps:
identifying a data table in the Shell script;
extracting the table name of the data table;
classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
acquiring data information corresponding to different types of data tables, and outputting the acquired different types of data information to the same preset document; the data table is a JOB association table; the preset document is a data table which is pre-established in a preset database, and the data table storing the data information is packaged into a packaged file;
outputting the obtained data information of different types to the same preset document, wherein the steps comprise: acquiring corresponding data information from the JOB association table and recording the data information into a temporary file; loading the temporary file into a Hive table of Hadoop; outputting the data information in the Hive table to the preset document in a Sqoop mode;
the classifying the data table according to the extracted table name includes:
judging whether each table name is an independent character string;
if the table name is an independent character string, judging whether the table name has empty spaces or line feed before and after the table name and whether the table name contains preset keywords;
if spaces or line feeds exist before and after the table name and preset keywords are contained before the table name, determining a data table corresponding to the table name as a source table;
and if spaces or line feeds exist before and after the table name and the preset keyword is not contained before the table name, determining the data table corresponding to the table name as a target table.
2. A method according to claim 1, wherein prior to said identifying a data table in a Shell script, the method further comprises:
traversing the Shell script according to preset keywords;
and positioning a data table in the Shell script according to the traversal result.
3. The method of claim 1, wherein said sorting the data table according to the extracted table name comprises:
determining a character string corresponding to the data table name;
and classifying the data table according to the character string.
4. The method of claim 1, wherein if the data table is a source table, the obtaining data information corresponding to the data tables of different types and outputting the obtained data information of different types to a same preset document comprises:
dividing the source table into an internal source table and an external source table;
acquiring data information corresponding to the internal source table and the external source table;
and outputting the acquired data information to a preset document.
5. The method of claim 1, wherein if the data table is a target table, the obtaining data information corresponding to different types of data tables and outputting the obtained different types of data information to a same preset document comprises:
dividing the target table into an insertion target table and an overlay target table;
acquiring data information corresponding to the insertion target table and the coverage target table;
and outputting the acquired data information to a preset document.
6. A Shell-based data table extraction terminal, the terminal comprising:
the identification unit is used for identifying a data table in the Shell script;
the extracting unit is used for extracting the table name of the data table;
the classification unit is used for classifying the data table according to the extracted table name, wherein the data table comprises a source table and a target table;
the acquisition unit is used for acquiring data information corresponding to different types of data tables and outputting the acquired data information of different types to the same preset document; the data table is a JOB association table; the preset document is a data table which is pre-established in a preset database, and the data table storing the data information is packaged into a packaged file;
outputting the obtained data information of different types to the same preset document, wherein the steps comprise: acquiring corresponding data information from the JOB association table and recording the data information into a temporary file; loading the temporary file into a Hive table of Hadoop; outputting the data information in the Hive table to the preset document in a Sqoop mode;
the classifying the data table according to the extracted table name includes:
judging whether each table name is an independent character string;
if the table name is an independent character string, judging whether the table name has empty spaces or line feed before and after the table name and whether the table name contains preset keywords;
if spaces or line feeds exist before and after the table name and preset keywords are contained before the table name, determining a data table corresponding to the table name as a source table;
and if spaces or line feeds exist before and after the table name and the preset keyword is not contained before the table name, determining the data table corresponding to the table name as a target table.
7. The terminal of claim 6, wherein the terminal further comprises:
the traversal unit is used for traversing the Shell script according to preset keywords;
and the positioning unit is used for positioning the data table in the Shell script according to the traversal result.
8. The terminal of claim 6, wherein the classification unit comprises:
a determination unit configured to determine a character string corresponding to the data table name;
and the classification subunit is used for classifying the data table according to the character string.
9. A Shell-based data table extraction device, comprising:
a memory for storing a program for implementing the data table extraction method; and
a processor for executing a program stored in the memory for implementing a data table extraction method to perform the method of any one of claims 1-5.
10. A computer-readable storage medium, storing one or more programs, the one or more programs being executable by one or more processors to perform the method of any one of claims 1-5.
CN201810196485.4A 2018-02-24 2018-03-09 Shell-based data table extraction method, terminal, equipment and storage medium Active CN108536745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/101880 WO2019161645A1 (en) 2018-02-24 2018-08-23 Shell-based data table extraction method, terminal, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810156612 2018-02-24
CN2018101566128 2018-02-24

Publications (2)

Publication Number Publication Date
CN108536745A CN108536745A (en) 2018-09-14
CN108536745B true CN108536745B (en) 2021-03-16

Family

ID=63483448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810196485.4A Active CN108536745B (en) 2018-02-24 2018-03-09 Shell-based data table extraction method, terminal, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN108536745B (en)
WO (1) WO2019161645A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359160A (en) * 2018-10-12 2019-02-19 平安科技(深圳)有限公司 Method of data synchronization, device, computer equipment and storage medium
CN110647564B (en) * 2019-08-14 2023-11-24 中国平安财产保险股份有限公司 Hive table building method, electronic device and computer readable storage medium
CN111460241B (en) * 2020-04-26 2024-01-23 甬矽电子(宁波)股份有限公司 Data query method and device, electronic equipment and storage medium
CN111767350A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Data warehouse testing method and device, terminal equipment and storage medium
CN111984659B (en) * 2020-07-28 2023-07-21 招联消费金融有限公司 Data updating method, device, computer equipment and storage medium
CN113190603A (en) * 2021-04-28 2021-07-30 中国邮政储蓄银行股份有限公司 Data processing method, data processing device, computer readable storage medium and processor
CN116578651B (en) * 2023-07-12 2023-11-17 北京集度科技有限公司 Data table structure synchronization method, system and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device
CN104536987A (en) * 2014-12-08 2015-04-22 联动优势电子商务有限公司 Data query method and device
CN104866595A (en) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 Method and apparatus for adding transaction control to relational database script
CN105868204A (en) * 2015-01-21 2016-08-17 中国移动(深圳)有限公司 Method and apparatus for converting script language SQL of Oracle

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11265293A (en) * 1998-03-17 1999-09-28 Nec Corp Script processor
CN102375826B (en) * 2010-08-13 2014-12-31 中国移动通信集团公司 Structured query language script analysis method, device and system
US8612487B2 (en) * 2011-09-07 2013-12-17 International Business Machines Corporation Transforming hierarchical language data into relational form
US8589450B2 (en) * 2011-12-28 2013-11-19 Business Objects Software Limited Mapping non-relational database objects into a relational database model
CN107169023A (en) * 2017-04-07 2017-09-15 广东精点数据科技股份有限公司 Data lineage analysis system and method based on sql semantic automatic analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device
CN104536987A (en) * 2014-12-08 2015-04-22 联动优势电子商务有限公司 Data query method and device
CN105868204A (en) * 2015-01-21 2016-08-17 中国移动(深圳)有限公司 Method and apparatus for converting script language SQL of Oracle
CN104866595A (en) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 Method and apparatus for adding transaction control to relational database script

Also Published As

Publication number Publication date
CN108536745A (en) 2018-09-14
WO2019161645A1 (en) 2019-08-29

Similar Documents

Publication Publication Date Title
CN108536745B (en) Shell-based data table extraction method, terminal, equipment and storage medium
EP3113043B1 (en) Method, device and host for updating metadata stored in columns in distributed file system
CN108388515B (en) Test data generation method, device, equipment and computer readable storage medium
CN109657177A (en) The generation method of the page, device, storage medium and computer equipment after upgrading
WO2021217846A1 (en) Interface data processing method and apparatus, and computer device and storage medium
CN111177113B (en) Data migration method, device, computer equipment and storage medium
CN106648569B (en) Target serialization realization method and device
CN108415998B (en) Application dependency relationship updating method, terminal, device and storage medium
CN110705226A (en) Spreadsheet creating method and device and computer equipment
CN110941779A (en) Page loading method and device, storage medium and electronic equipment
CN110442585B (en) Data updating method, data updating device, computer equipment and storage medium
CN110222046B (en) List data processing method, device, server and storage medium
CN110888972A (en) Sensitive content identification method and device based on Spark Streaming
CN111984659B (en) Data updating method, device, computer equipment and storage medium
CN113553458A (en) Data export method and device in graph database
CN111552527A (en) Method, device and system for translating characters in user interface and storage medium
CN112328272A (en) Algorithm upgrading method, device, equipment and storage medium
CN113434673A (en) Data processing method, computer-readable storage medium and electronic device
CN108334621B (en) Database operation method, device, equipment and computer readable storage medium
CN112711435A (en) Version updating method, version updating device, electronic equipment and storage medium
CN111651531A (en) Data import method, device, equipment and computer storage medium
US10795875B2 (en) Data storing method using multi-version based data structure
CN115840786B (en) Data lake data synchronization method and device
CN104794227A (en) Information matching method and device
CN113946517A (en) Abnormal data determination method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant